Select Random entries with criteria

  • Hi,

    I have a table with two columns 'id', 'tickets'.

    I want to make a draw and select randomly id's where the total of the tickets is a given number

    For example...

    id tickets

    -----------

    1 3

    2 5

    3 1

    4 2

    5 1

    6 2

    7 5

    8 1

    9 4

    10 2

    ----------

    I would like to select randomly id's where the total of the tickets would be 10.

    Like this..

    Using the above example, i would like to find how ca i make a draw and select a list of random unique id's in which the sum of the tickets woul be 10

    the id's:

    1, 2, 4

    fulfill the criteria bacause the sum of the corresponding tickets is 10.

    the same could be with,

    7, 8, 9

    or

    2, 7

    or

    4, 5, 6, 8, 9

    etc

    Is it possible?

    P.S. This is an example. Actually I want this to be executed an a large table with more than 100k entries

    Thanks in advance

    Denis

  • I do not believe that there is any truly efficient way to do this for an accurate random distribution, however, the most efficient would almost certainly be an iterative Monte Carlo method.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • This is an idea I have. It seems fair and truly random. Let me know what you think.

    insert into #temp

    Select top 100 id, value from baseTable order by newid()

    insert into #temp2

    Select id1, id2, id3, value1 + value2 + value3 as Total from #temp a, #temps b, #temp c

    where a.id <> b.id and b.id <> c.id and a.id <> c.id and a.value + b.value + c.value <= @Total

    Dump this into variables :

    Select top 1 id1, id2, id3, total from #temp2 order by Newid()

    if total from selected row < @Total

    fetch new random row from #temp

    if total still lower than requested, repeat and add winners to a list.

    Return winners list.

    That seems quite ramdom to me (the first top 100 is completly random... and it makes the set quite small, so even a tripple cross join makes only 1 000 000 rows to process).

    Play with it, see how many rows you need in every section of the process. Then please post the final results here so we can all benefit from your work.

  • Hmmm... an interesting problem. My question is, what's the reason behind choosing 10 as the sum, and also, are there any limits on the value of the tickets field. Answers might provide a method.

    Because 10 is a relatively small number, you could actually build a table of all the possible ways to sum up to a total of 10, provided that there are no negative number values and that there are no non-integer values for the tickets field. You could have any ten records with 1, any 5 with 2, or 8 1's and a 2, 7 1's and a 3, 6 1's and a 4, 5 1's and a 5, etc., etc...

    I can think of several ways to make use of that, but I have doubts about performance until I can test it, having answers from you on the details. You may also want to specify why you want to do this. If

    it's to be a regular event, you might want an easier method of choosing the ID's at random than one involving the sum of the tickets field.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • smunson (11/10/2008)


    Hmmm... an interesting problem. My question is, what's the reason behind choosing 10 as the sum, and also, are there any limits on the value of the tickets field. Answers might provide a method.

    Because 10 is a relatively small number, you could actually build a table of all the possible ways to sum up to a total of 10, provided that there are no negative number values and that there are no non-integer values for the tickets field. You could have any ten records with 1, any 5 with 2, or 8 1's and a 2, 7 1's and a 3, 6 1's and a 4, 5 1's and a 5, etc., etc...

    I can think of several ways to make use of that, but I have doubts about performance until I can test it, having answers from you on the details. You may also want to specify why you want to do this. If

    it's to be a regular event, you might want an easier method of choosing the ID's at random than one involving the sum of the tickets field.

    Steve

    (aka smunson)

    :):):)

    Hi Steve,

    The 10 is just an hypothetical number. However the actual size of the table will be more than 100,000 rows. That means there is a chance to have 100,000 or more id's

    There is no limit on the value of the tickets field but in reality it will not be more that 10

    The scope of all this is:

    I have a long list of applicants (id's) which if they win they will take a number of tickets (tickets value).

    The problem is that my company can provide only a specific number (sum) of tickets. That means, the sum of the tickets of the winners cannot be more than this specific number. I hope you understood what I am trying to say.

    Let's say we have 200,000 applicants each of them can take x number of tickets.

    My company gives 600,000 tickets.

    The draw has to choose a y number of applicants.

    Maybe a solution could be to run a random query in a loop and each time to subtract the applicant tickets value from the fixed number of the tickets that my company give till we fill it. Just an idea but I cannot do it!

    Any help?

  • Ok, I understand more about the background, but I don't really see anything that tells me the ultimate objective. This new info actually raises more questions than it provides in the way of answers, and perhaps even introduces a new problem.

    Let's tackle these one at a time. You appear to need to conduct a drawing from among all ticket holders. You need it to be random. You also now appear to be wanting to ensure that no more than 600,000 tickets are issued. As that latter item is a bit more critical to the process, let's start there. You need a table that holds one record for each draw, and the value starts at 600,000 and is decremented by each issuance of tickets. The issuance process must check the value before issuing a ticket, as well as then decrement the number for that draw by the number of tickets issued. As that method requires serialization of updates, an alternative might be to have 600,000 records in a table for each draw, with each record having the ticket number and the ID of the ticket holder. Using transactions can then keep more than one ticket buyer from getting the same ticket. However, that's a great deal of data volume.

    On the random draw side, I guess the process depends on the desired result. You stated that you need to choose "y" number of ticket holders, and determining a good process depends rather heavily on the actual value of "y", as well as on which scenario is involved on the "keep track of the ticket holders" side of things. While 10 may be an arbitrary number, choosing to use a SUM for this kind of data volume may not be the best idea. Also, with differing number of tickets, just choosing a fixed number of ticket holders cannot be guaranteed by looking at the sum. You could easily satisfy the SUM condition with just two ticketholders at 5 tickets each, and have it be just as random as when the condition is satisfied by 10 ticketholders with 1 ticket each.

    As before, more information is needed, and now on two fronts instead of just one.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • smunson (11/10/2008)


    Ok, I understand more about the background, but I don't really see anything that tells me the ultimate objective. This new info actually raises more questions than it provides in the way of answers, and perhaps even introduces a new problem.

    You need a table that holds one record for each draw, and the value starts at 600,000 and is decremented by each issuance of tickets. The issuance process must check the value before issuing a ticket, as well as then decrement the number for that draw by the number of tickets issued. As that method requires serialization of updates, an alternative might be to have 600,000 records in a table for each draw, with each record having the ticket number and the ID of the ticket holder. Using transactions can then keep more than one ticket buyer from getting the same ticket. However, that's a great deal of data volume.

    Actually the subtracting was an idea of how we can do the coding, not the object itself. Maybe it's wrong because of the data volume as you said. I am not the expert.

    smunson (11/10/2008)

    On the random draw side, I guess the process depends on the desired result. You stated that you need to choose "y" number of ticket holders, and determining a good process depends rather heavily on the actual value of "y", as well as on which scenario is involved on the "keep track of the ticket holders" side of things. While 10 may be an arbitrary number, choosing to use a SUM for this kind of data volume may not be the best idea. Also, with differing number of tickets, just choosing a fixed number of ticket holders cannot be guaranteed by looking at the sum. You could easily satisfy the SUM condition with just two ticketholders at 5 tickets each, and have it be just as random as when the condition is satisfied by 10 ticketholders with 1 ticket each.

    :):):)

    If I had the capability to say that x number of applicants will be drawn maybe this would be easier. What you think? Then the problem would be to choose randomly these applicants which fulfill the criteria of 600,000 tickets. As for your example I don't care if the 10 tickets will be given to 2 applicants (5 each) or 10 applicants (1 each). I just need NOT overcome the fixed total number of tickets.

    Hope I helped..Thank you..

  • Well, you still haven't mentioned how often this process has to take place. After all, if you do this only once a month, then a large number of records may not be a concern except perhaps from an archival standpoint. However, now that I'm thinking further on this, I'm thinking that perhaps part of the problem is the idea of "x" number of ticketholders, as opposed to "x" number of winning tickets. In the case of the latter, then whomever holds the winning ticket or tickets ends up a winner. Again, how often you need to do this makes a difference in how you might want to approach it. You can be sure that designating a particular ticket number or numbers as "winning" tickets is far easier than the other way around, and also, you then have an incentive for any given ticketholder to want to hold as many tickets as possible, as their odds improve with each additional ticket they hold. Under the alternative scenario of a fixed number of ticketholders becoming winners, then the odds of winning are based on the number of ticketholders rather than the number of tickets. Either is a valid choice, but if it's going to be based on the number of ticketholders, then you don't really need the tickets to begin with, as the number of tickets anyone has is largely irrelevant, especially when you consider the number of such ticket holders is in the hundreds of thousands.

    It sounds like perhaps the problem you need to solve first is to fix the number of tickets available for a given draw. This process is going to be highly dependent on how you plan to distribute ticket ownership, which seems likely to be web-based. I honestly think this is more planning and detail than a forum post can reasonably handle, as right now I'm not entirely sure which problem you're trying to solve, or that you're even sure of what might need to be done.

    One possible method would just be to print up 600,000 small numbered paper tickets, and buy a massive bingo-style ball to house them, and then draw tickets manually from the ball to determine the winning tickets. Then you can record the drawing process on video, instead of relying entirely on computer automation. It might also help with audits from any legal authorities that might regulate such drawings.

    You may need to consider getting consulting assistance. There are a lot of different ways to go, but I have very little info other than the max number of tickets is fixed at 600,000.

    I have to believe there are several hurdles to cross, which include but are not limited to the following:

    1.) Determining who owns a ticket, and which tickets they own, and a process for gaining such ownership.

    2.) Determining how to know which tickets are the winning tickets

    3.) Establishing the process for #2.

    There may be others, but I'm quite sure that a T-SQL query alone is not going to help much until these things are dealt with. I also suspect that to achieve these goals, one might need to know more than would ordinarily be made public, which is why you may be better off with consulting assistance.

    Send me a PM (private message) on this forum if you wish to go the consulting route.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I actually think a loop of some sort would probably be faster for this (GAK! I can't believe I just said that! :hehe:)... except for a couple of minor details... 😛 Like how to guarantee that you've selected a particular ID/Tickets combo only once without turning things into a triangular join. Of course, you could embed a "zero out" update in the loop, I suppose.

    So, I guess we'll do this in a set based fashion...

    First, apologies to Steve... this blows your chances to be a consultant on this job. :hehe:

    Second, testing on 10 rows when you really want 600,000 isn't going to prove performance for anything. To demo my suggestion and so others can play, here's a table of 600,000 rows that meet your original spec of "probably no more than 10 tickets per row"...

    --=======================================================================================

    -- Create a 600,000 row demo table similar to the original post.

    -- Note that this is NOT a part of the solution.

    -- This whole section takes about 8 seconds or so.

    --=======================================================================================

    --===== Work in a "safe" area

    USE TempDb

    --===== Conditionally drop the table so we can rerun if we want

    IF OBJECT_ID('TempDb.dbo.AllTickets','U') IS NOT NULL

    DROP TABLE TempDb.dbo.AllTickets

    --===== Create and populate the table on the fly

    CREATE TABLE dbo.AllTickets (ID INT IDENTITY(1,1), Tickets INT)

    INSERT INTO dbo.AllTickets

    ( Tickets)

    SELECT TOP 600000 ABS(CHECKSUM(NEWID()))%10+1

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2

    ... and a proposed solution... might be able to get away with not transfering all 600k rows, but with random numbers, you never really know. Also, I set the "Total to add up to" to 100... feel free to change it to whatever you really need... would probably be better as a variable...

    --=======================================================================================

    -- Here's one possible solution...

    --=======================================================================================

    --===== Again, make sure we're in the "safe" area to play

    USE TempDb

    --===== We need a place to store all the rows in random order. If that place already

    -- exists, drop it so we can rerun the code for demo.

    IF OBJECT_ID('TempDB..#Draw','U') IS NOT NULL

    DROP TABLE #Draw

    --===== Transfer the data to the "Draw" table in random order

    SELECT --About 11 Seconds with the WHERE clause, about 7 without (with no indexes).

    IDENTITY(INT,1,1) AS RowNum,

    CAST(ID AS INT) AS ID,

    Tickets,

    CAST(0 AS INT) AS Drawn

    INTO #Draw

    FROM dbo.AllTickets

    WHERE Tickets > 0 --This is just in case you decide to "zero out" previous winners

    ORDER BY NEWID()

    --===== Add a clustered primary key to keep things in the proper order for

    -- the "running total" trick we're getting ready to do.

    ALTER TABLE #Draw --About 2 seconds

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Declare the variables we need for the "running total" update.

    -- The function of the variables are self evident by name.

    DECLARE @PrevTotal INT,

    @NewTotal INT,

    @Dummy INT

    --===== Preset the "total" variables to 0 so we don't have to mess with NULLs

    SELECT @PrevTotal = 0,@NewTotal = 0

    --===== This update creates conditional running totals. Only the first items

    -- that add up to exactly 100 get a total put into the DRAWN column.

    UPDATE #Draw --About 7 seconds

    SET @NewTotal = CASE WHEN @PrevTotal + Tickets <= 100 THEN @PrevTotal + Tickets ELSE @PrevTotal END,

    @Dummy = Drawn = CASE WHEN @NewTotal > @PrevTotal THEN @NewTotal END,

    @PrevTotal = @NewTotal

    FROM #Draw WITH(INDEX(0))

    --===== Select the rows that were drawn

    SELECT ID, Tickets, Drawn

    FROM #Draw

    WHERE Drawn > 0

    ORDER BY RowNum

    Of course, you could add an extra update to the original table to "zero out" the tickets on the ID's that were just drawn so they wouldn't be drawn in subsequent drawings.

    Heh... I wonder if Ol' Monte` will like it... 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/10/2008)


    I actually think a loop of some sort would probably be faster for this (GAK! I can't believe I just said that! :hehe:)...

    Heh. Yeah, I've been biting my tongue for the last 24 hours trying to think of a different answer! 😛

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Jeff,

    I don't think you're quite taking into account the entire scope of the problem. This is most likely for an online website where people are going to be plunking down funds to acquire these tickets, so ownership of a given number of tickets is not something you can just randomize. The thinking process about how to draw winners wasn't clear from the OP's perspective - at least I wasn't convinced that it was, based on the OP's posts. Thus I really don't think that much has been solved. It's trivial to come up with some way to select winners at random, but I suspect there are more important issues that haven't been taken into account by the OP. The consulting offer still stands, as all you have is a method for zeroing out the winning tickets in the table. Establishing ownership of a given ticket hasn't been tackled yet, and probably needs to be web-based. As the OP has yet to either contact me or post again, he may have found he has difficulties that cannot be easily overcome, such as realizing the entire scope of what needs to be done may be far more than he/she had planned for.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Yep... understood... with the lack of a clear understanding, I was just solving the original post on a larger scale.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply