Problem with writing a query. Kindly guide...

  • Hi All,

    I have RATE and SHIFT columns in a table as mentioned below. When I pass RATE=1 then I get SHIFT = 1,2,3. Now in second step, I need to write a query that will find all rates where atleast these three shifts compulsorily exist (e.g. RATE 3,5 as per below data).

    All above work is to be done as a part of a single query and first input of RATE will be dynamic. Below mentioned data is only for example, and real table will be having huge data.

    RateShift

    11

    1 2

    1 3

    25

    31

    32

    33

    84

    41

    42

    51

    52

    53

    54

    61

    72

    82

    Regards,

    Ashish

  • You have been around here long enough to know that posting some consumable ddl and data is far more likely to get responses.

    That being said why is only 3 and 5 returned? What about 1?

    Seems to me like all you need is

    GROUP BY Col1

    HAVING COUNT(*) >= 3

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This problem is known as Relational Division[/url].

    SET NOCOUNT ON;

    USE tempdb;

    GO

    DECLARE @T TABLE (

    Rate int NOT NULL,

    Shift int NOT NULL,

    PRIMARY KEY (Rate, Shift)

    );

    INSERT INTO @T (

    Rate,

    Shift

    )

    VALUES

    (1, 1),

    (1, 2),

    (1, 3),

    (2, 5),

    (3, 1),

    (3, 2),

    (3, 3),

    (8, 4),

    (4, 1),

    (4, 2),

    (5, 1),

    (5, 2),

    (5, 3),

    (5, 4),

    (6, 1),

    (7, 2),

    (8, 2);

    DECLARE @Rate int = 1;

    WITH C1 AS (

    SELECT

    Shift

    FROM

    @T

    WHERE

    Rate = @Rate

    )

    SELECT

    A.Rate

    FROM

    @T AS A

    INNER JOIN

    C1 AS B

    ON A.Shift = B.Shift

    GROUP BY

    A.Rate

    HAVING

    COUNT(*) = (SELECT COUNT(*) FROM C1);

    GO

    /*

    Rate

    1

    3

    5

    */

  • Thanks to hunchback for the ddl and data. Here is another way to do the same thing.

    select Rate

    from @T

    cross apply (select count(*) as RowsCount from @t where rate = @Rate) r

    group by Rate

    having COUNT(*) >= max(r.RowsCount)

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (9/26/2013)


    Thanks to hunchback for the ddl and data. Here is another way to do the same thing.

    select Rate

    from @T

    cross apply (select count(*) as RowsCount from @t where rate = @Rate) r

    group by Rate

    having COUNT(*) >= max(r.RowsCount)

    Sean - Should this return Rate = 9? I think your approach simply returns any Rate that has 3 or more rows.

    DECLARE @T TABLE (

    Rate int NOT NULL,

    Shift int NOT NULL,

    PRIMARY KEY (Rate, Shift)

    );

    INSERT INTO @T (

    Rate,

    Shift

    )

    VALUES

    (1, 1),

    (1, 2),

    (1, 3),

    (2, 5),

    (3, 1),

    (3, 2),

    (3, 3),

    (8, 4),

    (4, 1),

    (4, 2),

    (5, 1),

    (5, 2),

    (5, 3),

    (5, 4),

    (6, 1),

    (7, 2),

    (8, 2),

    (9, 2),

    (9, 3),

    (9, 4);

    DECLARE @Rate int = 1;

    select Rate

    from @T

    cross apply (select count(*) as RowsCount from @t where rate = @Rate) r

    group by Rate

    having COUNT(*) >= max(r.RowsCount)

    I think this works as the OP requested also:

    SELECT Rate

    FROM @T

    WHERE Shift IN (SELECT Shift FROM @T WHERE Rate = @Rate)

    GROUP BY Rate

    HAVING COUNT(*) = (SELECT COUNT(*) FROM @T WHERE Rate = @Rate);


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • I am a bit curious how hunchback and my solutions would scale against a divide and conquer approach.

    SELECT Shift

    INTO #T

    FROM @T

    WHERE Rate = @Rate;

    DECLARE @Rows INT = @@ROWCOUNT;

    SELECT Rate

    FROM @T

    WHERE Shift IN (SELECT Shift FROM #T)

    GROUP BY Rate

    HAVING COUNT(*) = @Rows;

    GO

    DROP TABLE #T;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • well, almost all the sql masters have answered the questions.

    a simple group by and having count will serve the purpose.

  • Thanks hunchback, Sean Lange and dwain.c for help.

    I'm going with CROSS APPLY as it has least logical reads and lesser cost then other two.

    Thanks Again.

  • If I am reading this article (http://weblogs.sqlteam.com/peterl/archive/2010/07/02/Proper-Relational-Division-With-Sets.aspx) correctly, this solution should also be pretty fast:

    WITH Target AS

    (

    SELECT Rate, Shift

    FROM @T

    WHERE Rate = @Rate

    )

    SELECT c.Rate

    FROM

    (

    SELECT Rate, rc=COUNT(*)

    FROM @T

    GROUP BY Rate

    ) a

    JOIN

    (

    SELECT rc=COUNT(*)

    FROM Target

    ) b ON b.rc <= a.rc

    JOIN @T AS c ON c.Rate = a.Rate

    JOIN Target AS d ON d.Shift = c.Shift

    GROUP BY c.Rate

    HAVING COUNT(*) = MIN(b.rc);


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • T.Ashish (9/26/2013)


    Thanks hunchback, Sean Lange and dwain.c for help.

    I'm going with CROSS APPLY as it has least logical reads and lesser cost then other two.

    Thanks Again.

    I do not recommend that you judge a query's performance by its plan cost.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Continuing curiosity regarding this problem drove me to see how the proposed solutions scale, so I built a test harness.

    CREATE TABLE #T (

    Rate int NOT NULL,

    Shift int NOT NULL,

    PRIMARY KEY (Rate, Shift)

    );

    INSERT INTO #T (

    Rate,

    Shift

    )

    VALUES

    (-1, 1),

    (-1, 2),

    (-1, 3);

    WITH Tally (n) AS

    (

    SELECT TOP 500000 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a CROSS JOIN sys.all_columns b

    ),

    STally (n) AS

    (

    SELECT TOP 10 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a

    )

    INSERT INTO #T

    SELECT a.n, b.n

    FROM Tally a

    CROSS APPLY STally b;

    DELETE

    FROM #T

    WHERE (Rate < 20000 AND Shift = 1) OR (Rate > 400000 AND Shift = 3)

    OR (RATE BETWEEN 50000 AND 150000 AND Shift = 2);

    DECLARE @Rate INT = -1

    ,@Holder INT

    ,@StartDT DATETIME;

    SELECT @StartDT=GETDATE();

    PRINT 'Solution by Hunchback';

    SET STATISTICS TIME ON;

    WITH C1 AS (

    SELECT

    Shift

    FROM

    #T

    WHERE

    Rate = @Rate

    )

    SELECT

    @Holder=A.Rate

    FROM

    #T AS A

    INNER JOIN

    C1 AS B

    ON A.Shift = B.Shift

    GROUP BY

    A.Rate

    HAVING

    COUNT(*) = (SELECT COUNT(*) FROM C1);

    SET STATISTICS TIME OFF;

    SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT=GETDATE();

    PRINT 'Solution by Dwain.C';

    SET STATISTICS TIME ON;

    SELECT @Holder=Rate

    FROM #T

    WHERE Shift IN (SELECT Shift FROM #T WHERE Rate = @Rate)

    GROUP BY Rate

    HAVING COUNT(*) = (SELECT COUNT(*) FROM #T WHERE Rate = @Rate);

    SET STATISTICS TIME OFF;

    SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT=GETDATE();

    PRINT 'Divide and Conquer Solution by Dwain.C';

    SET STATISTICS TIME ON;

    SELECT Shift

    INTO #T1

    FROM #T

    WHERE Rate = @Rate;

    DECLARE @Rows INT = @@ROWCOUNT;

    SELECT @Holder=Rate

    FROM #T

    WHERE Shift IN (SELECT Shift FROM #T1)

    GROUP BY Rate

    HAVING COUNT(*) = @Rows;

    SET STATISTICS TIME OFF;

    SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());

    SELECT @StartDT=GETDATE();

    PRINT 'Solution by Peso/Adam Machanic';

    SET STATISTICS TIME ON;

    WITH Target AS

    (

    SELECT Rate, Shift

    FROM #T

    WHERE Rate = @Rate

    )

    SELECT @Holder=c.Rate

    FROM

    (

    SELECT Rate, rc=COUNT(*)

    FROM #T

    GROUP BY Rate

    ) a

    JOIN

    (

    SELECT rc=COUNT(*)

    FROM Target

    ) b ON b.rc <= a.rc

    JOIN #T AS c ON c.Rate = a.Rate

    JOIN Target AS d ON d.Shift = c.Shift

    GROUP BY c.Rate

    HAVING COUNT(*) = MIN(b.rc);

    SET STATISTICS TIME OFF;

    SELECT ElapsedMS=DATEDIFF(millisecond, @StartDT, GETDATE());

    GO

    DROP TABLE #T;

    DROP TABLE #T1;

    From the statistics, I get these CPU/Elapsed time results:

    Solution by Hunchback

    SQL Server Execution Times:

    CPU time = 1637 ms, elapsed time = 452 ms.

    Solution by Dwain.C

    SQL Server Execution Times:

    CPU time = 1467 ms, elapsed time = 473 ms.

    Divide and Conquer Solution by Dwain.C

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:

    CPU time = 1169 ms, elapsed time = 301 ms.

    Solution by Peso/Adam Machanic

    SQL Server Execution Times:

    CPU time = 1996 ms, elapsed time = 2235 ms.

    And these results are quite similar to the results SELECTED into the 4 results panes for elapsed MS:

    460

    480

    306

    2263

    Comments:

    - I eliminated Sean's solution because I believe he's solving a different problem. But of course since the OP has confirmed that he prefers Sean's solution, I have to assume that both Hunchback and I read the requirements wrong.

    - The result for my last suggestion (Peso/Adam Machanic solution) was probably impacted by the <= I had to use in the first JOIN. I believe their solution was based on the no remainder problem posed by Mr. Celko.

    I was surprised that Divide and Conquer came in at about 30% faster.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks Dwain.C

    Kindly ignore my earlier post, I missed it completely. Actually, Other four solutions are giving accurate results

    My results goes as below:

    1.hunchback

    WITH C1 AS (SELECT Shift FROM test12 WHERE Rate = 4)

    SELECT A.Rate FROM test12 AS A INNER JOIN C1 AS B ON A.Shift = B.Shift

    GROUP BY A.Rate HAVING COUNT(*) = (SELECT COUNT(*) FROM C1);

    Table 'test12'. Scan count 2, logical reads 48

    Query Cost relative to batch = 21%

    2.dwain.c

    SELECT Rate FROM test12

    WHERE Shift IN (SELECT Shift FROM test12 WHERE Rate = 4)

    GROUP BY Rate HAVING COUNT(*) = (SELECT COUNT(*) FROM test12 WHERE Rate = 4);

    Table 'test12'. Scan count 2, logical reads 48

    Query Cost relative to batch = 21%

    3.dwain.c (divide and concur)

    SELECT Shift INTO #T FROM @T WHERE Rate = @Rate;

    DECLARE @Rows INT = @@ROWCOUNT;

    SELECT Rate FROM @T

    WHERE Shift IN (SELECT Shift FROM #T)

    GROUP BY Rate HAVING COUNT(*) = @Rows;

    GO

    DROP TABLE #T;

    Table 'test12'. Scan count 1, logical reads 2

    '#T_____00000000000D'. Scan count 1, logical reads 22

    Table 'test12'. Scan count 1, logical reads 2

    Query Cost relative to batch = 23%

    4.dwain.c

    WITH Target AS

    (SELECT Rate, Shift FROM test12 WHERE Rate = 4)

    SELECT c.Rate

    FROM

    (SELECT Rate, rc=COUNT(*) FROM test12 GROUP BY Rate) a

    JOIN

    (SELECT rc=COUNT(*) FROM Target) b ON b.rc <= a.rc

    JOIN test12 AS c ON c.Rate = a.Rate

    JOIN Target AS d ON d.Shift = c.Shift

    GROUP BY c.Rate

    HAVING COUNT(*) = MIN(b.rc);

    Table 'test12'. Scan count 8, logical reads 54,

    Query Cost relative to batch = 35%

    So, Cost wise, 1 and 2 are performing better then 3.

    ElapsedMS is as below:

    990

    956

    960 (Divide and Conquer)

    2723

    I could not get "Divide and Conquer" at first place.

    Thanks again everyone for putting your efforts.

  • T.Ashish (9/27/2013)


    Thanks Dwain.C

    Kindly ignore my earlier post, I missed it completely. Actually, Other four solutions are giving accurate results

    My results goes as below:

    1.hunchback

    WITH C1 AS (SELECT Shift FROM test12 WHERE Rate = 4)

    SELECT A.Rate FROM test12 AS A INNER JOIN C1 AS B ON A.Shift = B.Shift

    GROUP BY A.Rate HAVING COUNT(*) = (SELECT COUNT(*) FROM C1);

    Table 'test12'. Scan count 2, logical reads 48

    Query Cost relative to batch = 21%

    2.dwain.c

    SELECT Rate FROM test12

    WHERE Shift IN (SELECT Shift FROM test12 WHERE Rate = 4)

    GROUP BY Rate HAVING COUNT(*) = (SELECT COUNT(*) FROM test12 WHERE Rate = 4);

    Table 'test12'. Scan count 2, logical reads 48

    Query Cost relative to batch = 21%

    3.dwain.c (divide and concur)

    SELECT Shift INTO #T FROM @T WHERE Rate = @Rate;

    DECLARE @Rows INT = @@ROWCOUNT;

    SELECT Rate FROM @T

    WHERE Shift IN (SELECT Shift FROM #T)

    GROUP BY Rate HAVING COUNT(*) = @Rows;

    GO

    DROP TABLE #T;

    Table 'test12'. Scan count 1, logical reads 2

    '#T_____00000000000D'. Scan count 1, logical reads 22

    Table 'test12'. Scan count 1, logical reads 2

    Query Cost relative to batch = 23%

    4.dwain.c

    WITH Target AS

    (SELECT Rate, Shift FROM test12 WHERE Rate = 4)

    SELECT c.Rate

    FROM

    (SELECT Rate, rc=COUNT(*) FROM test12 GROUP BY Rate) a

    JOIN

    (SELECT rc=COUNT(*) FROM Target) b ON b.rc <= a.rc

    JOIN test12 AS c ON c.Rate = a.Rate

    JOIN Target AS d ON d.Shift = c.Shift

    GROUP BY c.Rate

    HAVING COUNT(*) = MIN(b.rc);

    Table 'test12'. Scan count 8, logical reads 54,

    Query Cost relative to batch = 35%

    So, Cost wise, 1 and 2 are performing better then 3.

    ElapsedMS is as below:

    990

    956

    960 (Divide and Conquer)

    2723

    I could not get "Divide and Conquer" at first place.

    Thanks again everyone for putting your efforts.

    #1 and 2 have effectively identical execution plans so any timing variances you're seeing with those is probably random and multiple testing runs probably would swing one way or the other.

    #3 (Divide and Conquer) has an advantage that will likely grow depending on the number of rows in your table (the more rows, the more the initial INSERT capturing the row count will help). I think anyway.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Well I guess I didn't quite get the requirements. Chock that up to a lack of details. This thread could serve as a good example of why it is so important to provide those details. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • #3 (Divide and Conquer) has an advantage that will likely grow depending on the number of rows in your table (the more rows, the more the initial INSERT capturing the row count will help). I think anyway.

    I will keep that in mind.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply