How to compare a list of numbers, kind of like lottery results

  • This isn't for a project I am currently working on, but I have been curious about how you could make this comparison:

    Say you have a table that has records with numbers sort of like lottery winning numbers, say:

    TableWinners

    num1, num2, num3, num4, num5, num6

    33 52 47 23 17 28

    ... more records with similar structure.

    Then you have another table with chosen numbers, same structure as above, TableGuesses.

    How could you do the following comparisons between TableGuesses and TableWinners:

    1. Compare a single record in TableGuesses to a single record in TableWinners to get a count of the number of numbers that match (kind of a typical lottery type of thing).

    2. Compare a single record in TableGuessess to ALL records in TableWinners to see which record in TableWinners is the closest match to the selected record in TableGuesses.

    Does that make sense?

    Thanks!

  • You would probably unpivot the data in both tables so that you have a key for which set of numbers it belongs to, but each winning/drawn number is its own row.

    Then for the first one you join the unpivoted tables and count rows and you have a count of matches.

    For the second you do the same steps for all rows of the winning numbers table, grouped by a unique identifier of the drawing with a count, and then order the results by number of matches so that you can get the row(s) with the highest matches.

  • Interesting...

    I have never unpivoted a table, but I think I understand what it is your are saying. I will read up on unpivot to see how to make it work.

    Thanks!

  • robert.wiglesworth (2/5/2015)


    This isn't for a project I am currently working on, but I have been curious about how you could make this comparison:

    Say you have a table that has records with numbers sort of like lottery winning numbers, say:

    TableWinners

    num1, num2, num3, num4, num5, num6

    33 52 47 23 17 28

    ... more records with similar structure.

    Then you have another table with chosen numbers, same structure as above, TableGuesses.

    How could you do the following comparisons between TableGuesses and TableWinners:

    1. Compare a single record in TableGuesses to a single record in TableWinners to get a count of the number of numbers that match (kind of a typical lottery type of thing).

    2. Compare a single record in TableGuessess to ALL records in TableWinners to see which record in TableWinners is the closest match to the selected record in TableGuesses.

    Does that make sense?

    Thanks!

    Normalise it out, one row per number instead of one row per column, and it's trivial. Left-join guesses to winners on number, and whatever you want to call a "strip", then aggregate by strip. If you can't change the schema then split your columns into rows using CROSS APPLY VALUES (Dwain Camps) or delimitedsplit2k8 (Jeff Moden).


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Ok, I see what you are saying...

    instead of something like this:

    drawdate num1 num2 num3 num4 num5 num6

    02/05/2015 55 12 45 33 22 06

    Do something like this:

    drawdate num

    02/05/2015 55

    02/05/2015 12

    02/05/2015 45

    02/05/2015 33

    02/05/2015 22

    02/05/2015 06

    Yes? So that would lead to a much taller but narrow table. Is there a performance issue associated with that?

  • Basically you are making the rows into columns, each with the identifier of the original row, and one of the winning numbers.

    There is an UNPIVOT sql statement, or you can achieve the same logically using CROSS APPLY and VALUES

    Some examples to get you started

    CREATE TABLE #WinningNumbers

    (WinningNumberID int,

    num1 int,

    num2 int,

    num3 int,

    num4 int,

    num5 int,

    num6 int)

    INSERT #WinningNumbers (WinningNumberID,num1,num2,num3,num4,num5,num6)

    SELECT 1, 12,25,27,31,40,45

    --FIRST UNPIVOT

    SELECT WinningNumberID, IndNumber, NumberID

    FROM

    (SELECT WinningNumberID,num1,num2,num3,num4,num5,num6

    FROM #WinningNumbers) p

    UNPIVOT

    (IndNumber FOR NumberID IN (num1,num2,num3,num4,num5,num6)) AS unpiv

    --UNPIVOT USING APPLY

    SELECT WinningNumberID,IndNumber,NumberID

    FROM #WinningNumbers

    CROSS APPLY (VALUES ('num1',num1),('num2',num2),('num3',num3),('num4',num4),('num5',num5),('num6',num6)) AS x(NumberID,IndNumber)

    DROP TABLE #WinningNumbers

    As for whether there is a performance issue, yes, potentially. The actually apply method of unpivoting is fairly fast, but if you are doing that, and then querying the unpivoted table, then if there is a lot of data it could be slow. Note: the performance issue isn't because you are creating more rows, it is because you are scanning and transforming everything then querying it)

    Of course, it could be argued that the numbers should stored in this fashion to begin with.

  • Interesting...

    I have never unpivoted a table, but I think I understand what it is your are saying. I will read up on unpivot to see how to make it work.

    Thanks!

    Check out An Alternative (Better?) Method to UNPIVOT (SQL Spackle)[/url]

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Using the unpivot method in Dwain's article I came up with this solution...

    (Note that I used 4 numbers for brevity)

    USE SSRS_POC

    GO

    IF OBJECT_ID('tempdb..#tablewinners') IS NOT NULL DROP TABLE #tablewinners;

    IF OBJECT_ID('tempdb..#tableguesses') IS NOT NULL DROP TABLE #tableguesses;

    CREATE TABLE #tableWinners (nid int primary key, n1 int, n2 int, n3 int, n4 int);

    CREATE TABLE #tableguesses (nid int primary key, n1 int, n2 int, n3 int, n4 int);

    GO

    INSERT #tableWinners VALUES (1,12,19,36,40);

    INSERT #tableguesses

    VALUES(1,11,12,34,41),--1 match (12)

    (2,41,34,11,19),--1 match (19)

    (3,40,4,19,12);--3 matches (12,19,40)

    WITH

    winners AS

    ( SELECT nid, n, nvalue

    FROM #tableWinners

    CROSS APPLY(VALUES ('n1',n1),('n2',n2),('n3',n3),('n4',n4)) x(n,nvalue)

    ),

    Guesses AS

    ( SELECT nid, n, nvalue

    FROM #tableGuesses

    CROSS APPLY (VALUES ('n1',n1),('n2',n2),('n3',n3),('n4',n4)) x(n,nvalue)

    )

    SELECT

    --TOP 1 WITH TIES

    g.nid, count(g.nid) AS matches

    FROM winners w

    JOIN Guesses g ON w.nvalue = g.nvalue

    GROUP BY g.nid

    --ORDER BY matches DESC

    DROP TABLE #tableWinners

    DROP TABLE #tableguesses

    This will give you how many numbers match. Uncomment the TOP 1 WITH TIES and ORDER BY clauses and it will return the closest match.

    Edit: Code formatting

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Let's see if I can work past my blushing here and help out a little bit.

    First off, Alan.B's got a great solution as it knocks off both of your specific requirements with basically a single query.

    For obvious reasons, I like the CROSS APPLY VALUES approach to UNPIVOT. 🙂

    Since this is an academic question according to the OP, let's look at a few perturbations of the requirement:

    Suppose that your table of winners contains 5 or less numbers:

    1. After unpivoting you want to find any matches in the guesses table that contain all of those numbers. This is relational division with remainder if there are less than 5 numbers in the table.

    2. If your table of winners contains exactly 5 numbers and the winner must be a 100% match against the guess, this is relational division with no remainder.

    In either cases of 1 and 2, if we assume that the guesses table can contain more than five numbers, this article covers both:

    High Performance Relational Division [/url]

    3. If your table of winners must be matched by the sequence of occurrence, that's a special case of relational division that Peter Larsson aptly named "ordered relational division."

    Identifying a Subsequence in a Sequence, Part 3

    4. Suppose you introduce a requirement that in the case of a tie (same count of matches), the "best match" is the one whose difference in the remaining numbers is minimized.

    Now that latter is an interesting problem! Since you didn't state it as a requirement I won't try to solve it right now, but don't be surprised if someday you see an article on it!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This was removed by the editor as SPAM

  • jefferyjordan26 wrote:

    Hey there!

    Each winning/drawn number has its own row, however you would probably unpivot the data in both tables so that you have a key for which group of numbers it belongs to. Then you connect the unpivoted tables for the first one, count the rows, and then you get a count of matches. In order to retrieve the row(s) with the most matches, you need to perform the same processes for each row of the winning numbers table for the second drawing, grouping them by a counted unique drawing identification.

    That's basically what most of the posts and a lot of the code on this thread suggests.  Have you some alternative code that you'd like to share?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply