Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

  • Here are the results from a 10 million row run. And, SURPRISE! We have a new winner! (Jonathan AC Roberts ). Also, keep in mind that I'm doing this on a single CPU on an old 1.8GHz box. Your mileage WILL vary! 🙂

    [font="Courier New"]========== Code From Article ===========================================

    SQL Server Execution Times:

    CPU time = 1281 ms, elapsed time = 1580 ms.

    ========== Arjun S =====================================================

    SQL Server Execution Times:

    CPU time = 703 ms, elapsed time = 745 ms.

    ========== tommyh =====================================================

    SQL Server Execution Times:

    CPU time = 2594 ms, elapsed time = 2922 ms.

    ========== Toby Harman =====================================================

    SQL Server Execution Times:

    CPU time = 3219 ms, elapsed time = 3576 ms.

    ========== James Dingle =====================================================

    SQL Server Execution Times:

    CPU time = 3875 ms, elapsed time = 4254 ms.

    ========== chintan.j.gandhi =====================================================

    SQL Server Execution Times:

    CPU time = 9109 ms, elapsed time = 10981 ms.

    ========== Dalibor Margotic CTE (First method already posted by Arjun S) =====

    SQL Server Execution Times:

    CPU time = 734 ms, elapsed time = 875 ms.

    ========== MAGOO =====================================================

    SQL Server Execution Times:

    CPU time = 719 ms, elapsed time = 768 ms.

    ========== Venoym =====================================================

    SQL Server Execution Times:

    CPU time = 1265 ms, elapsed time = 1449 ms.

    ========== Jonathan AC Roberts =====================================================

    SQL Server Execution Times:

    CPU time = 437 ms, elapsed time = 461 ms.

    ========== bob.probst =====================================================

    SQL Server Execution Times:

    CPU time = 12344 ms, elapsed time = 13928 ms.

    ========== Vyengr =====================================================

    SQL Server Execution Times:

    CPU time = 766 ms, elapsed time = 942 ms.

    ========== ejoell 66477 =====================================================

    SQL Server Execution Times:

    CPU time = 10500 ms, elapsed time = 11831 ms.

    ========== David Rueter =====================================================

    SQL Server Execution Times:

    CPU time = 11110 ms, elapsed time = 12419 ms.

    ========== ColdCoffee =====================================================

    SQL Server Execution Times:

    CPU time = 2719 ms, elapsed time = 3013 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, did you tweak the CustomerID MOD value from 50000 at all for that?

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Results for 10M rows with MOD 500000 on CustomerID on QUAD core desktop with 4GB ram.

    ========== Code From Article ===========================================

    SQL Server Execution Times:

    CPU time = 1576 ms, elapsed time = 803 ms.

    ========== Arjun S =====================================================

    SQL Server Execution Times:

    CPU time = 343 ms, elapsed time = 367 ms.

    ========== tommyh =====================================================

    SQL Server Execution Times:

    CPU time = 1764 ms, elapsed time = 808 ms.

    ========== Toby Harman =====================================================

    SQL Server Execution Times:

    CPU time = 1762 ms, elapsed time = 846 ms.

    ========== James Dingle =====================================================

    SQL Server Execution Times:

    CPU time = 1263 ms, elapsed time = 1349 ms.

    ========== chintan.j.gandhi =====================================================

    SQL Server Execution Times:

    CPU time = 2683 ms, elapsed time = 3619 ms.

    ========== Dalibor Margotic CTE (First method already posted by Arjun S) =====

    SQL Server Execution Times:

    CPU time = 343 ms, elapsed time = 373 ms.

    ========== MAGOO =====================================================

    SQL Server Execution Times:

    CPU time = 1014 ms, elapsed time = 1152 ms.

    ========== Venoym =====================================================

    SQL Server Execution Times:

    CPU time = 1997 ms, elapsed time = 4494 ms.

    ========== Jonathan AC Roberts =====================================================

    SQL Server Execution Times:

    CPU time = 2231 ms, elapsed time = 2529 ms.

    ========== bob.probst =====================================================

    SQL Server Execution Times:

    CPU time = 998 ms, elapsed time = 735 ms.

    ========== Vyengr =====================================================

    SQL Server Execution Times:

    CPU time = 998 ms, elapsed time = 1114 ms.

    ========== ejoell 66477 =====================================================

    SQL Server Execution Times:

    CPU time = 1154 ms, elapsed time = 612 ms.

    ========== David Rueter =====================================================

    SQL Server Execution Times:

    CPU time = 1046 ms, elapsed time = 642 ms.

    ========== ColdCoffee =====================================================

    SQL Server Execution Times:

    CPU time = 1590 ms, elapsed time = 800 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (3/29/2012)


    Jeff, did you tweak the CustomerID MOD value from 50000 at all for that?

    No, I sure didn't. Let me try that on my box. I'll run it up to 500,000 just like you did.

    And, thanks for posting your test.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And the original 1M rows on my QUAD/4GB

    Another 10M rows with MOD 50000 and a new code

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    ========== Code From Article ===========================================

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 624 ms, elapsed time = 647 ms.

    ========== Arjun S =====================================================

    SQL Server Execution Times:

    CPU time = 281 ms, elapsed time = 293 ms.

    ========== tommyh =====================================================

    SQL Server Execution Times:

    CPU time = 1388 ms, elapsed time = 681 ms.

    ========== Toby Harman =====================================================

    SQL Server Execution Times:

    CPU time = 1435 ms, elapsed time = 680 ms.

    ========== James Dingle =====================================================

    SQL Server Execution Times:

    CPU time = 1872 ms, elapsed time = 1993 ms.

    ========== chintan.j.gandhi =====================================================

    SQL Server Execution Times:

    CPU time = 2839 ms, elapsed time = 3190 ms.

    ========== Dalibor Margotic CTE (First method already posted by Arjun S) =====

    SQL Server Execution Times:

    CPU time = 281 ms, elapsed time = 288 ms.

    ========== MAGOO =====================================================

    SQL Server Execution Times:

    CPU time = 265 ms, elapsed time = 298 ms.

    ========== Venoym =====================================================

    SQL Server Execution Times:

    CPU time = 484 ms, elapsed time = 537 ms.

    ========== Jonathan AC Roberts =====================================================

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 179 ms.

    ========== bob.probst =====================================================

    SQL Server Execution Times:

    CPU time = 16848 ms, elapsed time = 4631 ms.

    ========== Vyengr =====================================================

    SQL Server Execution Times:

    CPU time = 265 ms, elapsed time = 292 ms.

    ========== ejoell 66477 =====================================================

    SQL Server Execution Times:

    CPU time = 16707 ms, elapsed time = 4708 ms.

    ========== David Rueter =====================================================

    SQL Server Execution Times:

    CPU time = 16662 ms, elapsed time = 5004 ms.

    ========== ColdCoffee =====================================================

    SQL Server Execution Times:

    CPU time = 1342 ms, elapsed time = 668 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ========== Magoo2 =====================================================

    SQL Server Execution Times:

    CPU time = 172 ms, elapsed time = 176 ms.

    My new version to match Johnathan AC Roberts is a twist on Arjun S:

    (

    SELECT CustomerID

    FROM #Purchase p1

    WHERE ProductCode = 'A'

    EXCEPT

    SELECT CustomerID

    FROM #Purchase p1

    WHERE ProductCode = 'C'

    )

    INTERSECT

    SELECT CustomerID

    FROM #Purchase p1

    WHERE ProductCode = 'B'

    But I think this just highlights how variable results can be depending on the actual distribution of the data....

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • It's amazing what adding just a couple of hundred thousand customers will do. 🙂 Here's the 10 million row run with 500,000 customers like Magoo had instead of just the original 50,000.

    Again, it's amazing that all the code submitted is within a second or two of each other across 10 million rows of data. Anyone remember where a 50,000 row database was considered to be "huge"? I sure do.

    [font="Courier New"]========== Code From Article ===========================================

    SQL Server Execution Times:

    CPU time = 2656 ms, elapsed time = 2895 ms.

    ========== Arjun S =====================================================

    SQL Server Execution Times:

    CPU time = 891 ms, elapsed time = 893 ms.

    ========== tommyh =====================================================

    SQL Server Execution Times:

    CPU time = 3281 ms, elapsed time = 3273 ms.

    ========== Toby Harman =====================================================

    SQL Server Execution Times:

    CPU time = 3281 ms, elapsed time = 3307 ms.

    ========== James Dingle =====================================================

    SQL Server Execution Times:

    CPU time = 2734 ms, elapsed time = 2751 ms.

    ========== chintan.j.gandhi =====================================================

    SQL Server Execution Times:

    CPU time = 7140 ms, elapsed time = 7274 ms.

    ========== Dalibor Margotic CTE (First method already posted by Arjun S) =====

    SQL Server Execution Times:

    CPU time = 1079 ms, elapsed time = 1088 ms.

    ========== MAGOO =====================================================

    SQL Server Execution Times:

    CPU time = 954 ms, elapsed time = 953 ms.

    ========== Venoym =====================================================

    SQL Server Execution Times:

    CPU time = 3985 ms, elapsed time = 4758 ms.

    ========== Jonathan AC Roberts =====================================================

    SQL Server Execution Times:

    CPU time = 859 ms, elapsed time = 864 ms.

    ========== bob.probst =====================================================

    SQL Server Execution Times:

    CPU time = 2562 ms, elapsed time = 2937 ms.

    ========== Vyengr =====================================================

    SQL Server Execution Times:

    CPU time = 953 ms, elapsed time = 948 ms.

    ========== ejoell 66477 =====================================================

    SQL Server Execution Times:

    CPU time = 2719 ms, elapsed time = 2723 ms.

    ========== David Rueter =====================================================

    SQL Server Execution Times:

    CPU time = 2563 ms, elapsed time = 2623 ms.

    ========== ColdCoffee =====================================================

    SQL Server Execution Times:

    CPU time = 3687 ms, elapsed time = 3814 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mister.magoo (3/29/2012)


    But I think this just highlights how variable results can be depending on the actual distribution of the data....

    And number of processors. Here's the 10M row 500K customer run for the Magoo 2 code on a single processor.

    [font="Courier New"]========== Code From Article ===========================================

    SQL Server Execution Times:

    CPU time = 2609 ms, elapsed time = 2750 ms.

    ========== Arjun S =====================================================

    SQL Server Execution Times:

    CPU time = 844 ms, elapsed time = 872 ms.

    ========== tommyh =====================================================

    SQL Server Execution Times:

    CPU time = 3219 ms, elapsed time = 3330 ms.

    ========== Toby Harman =====================================================

    SQL Server Execution Times:

    CPU time = 3265 ms, elapsed time = 3345 ms.

    ========== James Dingle =====================================================

    SQL Server Execution Times:

    CPU time = 2828 ms, elapsed time = 2824 ms.

    ========== chintan.j.gandhi =====================================================

    SQL Server Execution Times:

    CPU time = 7094 ms, elapsed time = 7148 ms.

    ========== Dalibor Margotic CTE (First method already posted by Arjun S) =====

    SQL Server Execution Times:

    CPU time = 1047 ms, elapsed time = 1062 ms.

    ========== MAGOO =====================================================

    SQL Server Execution Times:

    CPU time = 906 ms, elapsed time = 906 ms.

    ========== Venoym =====================================================

    SQL Server Execution Times:

    CPU time = 3703 ms, elapsed time = 3768 ms.

    ========== Jonathan AC Roberts =====================================================

    SQL Server Execution Times:

    CPU time = 860 ms, elapsed time = 885 ms.

    ========== bob.probst =====================================================

    SQL Server Execution Times:

    CPU time = 2484 ms, elapsed time = 2580 ms.

    ========== Vyengr =====================================================

    SQL Server Execution Times:

    CPU time = 938 ms, elapsed time = 969 ms.

    ========== ejoell 66477 =====================================================

    SQL Server Execution Times:

    CPU time = 2640 ms, elapsed time = 2673 ms.

    ========== David Rueter =====================================================

    SQL Server Execution Times:

    CPU time = 2516 ms, elapsed time = 2584 ms.

    ========== ColdCoffee =====================================================

    SQL Server Execution Times:

    CPU time = 3609 ms, elapsed time = 3645 ms.

    Warning: Null value is eliminated by an aggregate or other SET operation.

    ========== MAGOO 2 =====================================================

    SQL Server Execution Times:

    CPU time = 2016 ms, elapsed time = 2041 ms.

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yep, I think that's flogged that one to death. I'm sleepy. G'night.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • mister.magoo (3/29/2012)


    Yep, I think that's flogged that one to death. I'm sleepy. G'night.

    Heh... yeah. And if you've ever seen a picture of me, you know I can definitely use a "beauty sleep". 😛 Good night folks and thank you all again for making this a really fun and informative discussion!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff

    I was thinking about something that might look like this:

    SELECT DISTINCT p1.CustomerID, Count(p1.CustomerID)

    FROM

    (Select Distinct CustomerID, ProductCode From #Purchase WHERE ProductCode In ('A', 'B')

    ) p1

    WHERE

    NOT EXISTS (SELECT *

    FROM #Purchase p3

    WHERE p3.CustomerID = p1.CustomerID

    AND p3.ProductCode = 'C'))

    Group By p1.CustomerID

    Having Count(p1.CustomerID) =2

  • mark hutchinson (3/29/2012)


    @Jeff

    I was thinking about something that might look like this:

    FROM

    (Select Distinct CustomerID, ProductCode From #Purchase WHERE ProductCode In ('A', 'B')

    ) p1

    This will bring all the cusotmerIDs that are only 'A' or 'B' and not 'C'. The problem is about bringing 'A' and 'B' but not 'C'

  • @ColdCoffee

    That is why I used a Having clause

  • ColdCoffee (3/29/2012)


    mark hutchinson (3/29/2012)


    @Jeff

    I was thinking about something that might look like this:

    FROM

    (Select Distinct CustomerID, ProductCode From #Purchase WHERE ProductCode In ('A', 'B')

    ) p1

    This will bring all the cusotmerIDs that are only 'A' or 'B' and not 'C'. The problem is about bringing 'A' and 'B' but not 'C'

    Actually, I tested the code against the original data example I posted in the article and it correctly returns 1,2,3.

    It takes about the same amount of time as the code in the article.

    [font="Courier New"]SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 266 ms, elapsed time = 316 ms.

    [/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mark hutchinson (3/29/2012)


    @ColdCoffee

    That is why I used a Having clause

    You are right; i did not notice that u are choosing both CustomerID and ProductCode in your sub-query.I thot you were only CustomerID. Apologies.

  • Hi guys,

    I didn't look if this solution has allready been posted but using CTE ,intersect and except

    for 10 mil rows it looks like this

    with

    eligible_customer_a as

    (select distinct CustomerID from #Purchase where ProductCode ='A')

    ,

    eligible_customer_b as

    (select distinct customerid from #Purchase where ProductCode='B')

    ,non_eligible_customer_c as

    (select distinct customerid from #Purchase where ProductCode='C')

    select customerid from eligible_customer_a

    intersect

    select customerid from eligible_customer_b

    except

    select customerid from non_eligible_customer_c

    And the statistics are

    SQL Server parse and compile time:

    CPU time = 5 ms, elapsed time = 5 ms.

    (30 row(s) affected)

    SQL Server Execution Times:

    CPU time = 219 ms, elapsed time = 227 ms.

  • Viewing 15 posts - 61 through 75 (of 166 total)

    You must be logged in to reply to this topic. Login to reply