Silly SQL Question

  • I'm sure there's a simple way to do this, but I can't seem to figure it out.  I have data something like this:

    ID, RESULT

    1, Y

    1, Y

    1, Y

    2, N

    2, N

    3, Y

    3, N

    4, Y

    4, N

    I only want to pull records where the ID has both a result of 'Y' and a result of 'N' (In this case, that would pull all records with ID 3 and 4)

    My instinct is to create a view pulling all the "Y" values and another view pulling all the "N" values, and a third query matching the IDs of the two views, thereby showing me all IDs with both a Y and and N.  Is there a way to do this in one query?

  • something like this?

    SELECT DISTINCT ID
    FROM Response r
    WHERE EXISTS (SELECT 1 FROM Response r2 WHERE r2.ID = r.ID AND r2.result = 'Y')
    AND EXISTS (SELECT 1 FROM Response r2 WHERE r2.ID = r.ID AND r2.Result = 'N');
  • How you go about it may depend upon how much data you have in the table or some other details we don't know about. You'd just have to play around with any suggestions. A CTE may be an option - something along the lines of:

    WITH CTE (ID, Result, Previous) 
    AS
    (
    SELECT ID, Result, Lag(Result, 1, Result) OVER(PARTITION BY ID ORDER BY ID) AS Previous
    FROM YourTable
    )
    SELECT ID
    FROM CTE
    WHERE Result <> Previous

    Sue

     

  • You don't need to scan the table 3 times.

    2 would be perfectly enough:

    SELECT ID
    FROM Response r
    WHERE r.result = 'Y'
    AND EXISTS (SELECT * FROM Response r2 WHERE r2.ID = r.ID AND r2.Result = 'N')
    GROUP BY ID;

    And please avoid using the keyword DISTINCT. It's a very bad habit.

    _____________
    Code for TallyGenerator

  • Actually, a single scan would be just fine:

    SELECT ID
    FROM Response r
    GROUP BY ID
    HAVING MAX(result) <> MIN(result);

    _____________
    Code for TallyGenerator

  • Sergiy wrote:

    Actually, a single scan would be just fine:

    SELECT ID
    FROM Response r
    GROUP BY ID
    HAVING MAX(result) <> MIN(result);

    You've just gotta love the old ways. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If the only values available in the result column are Y or N - you could do this:

     Select r.id
    From Response r
    Group By
    r.id
    Having count_big(Distinct r.result) = 2;

    Note: using count_big eliminates an implicit conversion as the result from count needs to be converted to bigint (for some reason).  This produces the same plan as Sergiy's solution with HAVING MAX(result) <> MIN(result) - the difference will be in the sort where the distinct sort will eliminate duplicate rows and the max/min does not.

    Test both versions in your actual code to determine which is better for your implementation.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    If the only values available in the result column are Y or N - you could do this:

     Select r.id
    From Response r
    Group By
    r.id
    Having count_big(Distinct r.result) = 2;

    Note: using count_big eliminates an implicit conversion as the result from count needs to be converted to bigint (for some reason).  This produces the same plan as Sergiy's solution with HAVING MAX(result) <> MIN(result) - the difference will be in the sort where the distinct sort will eliminate duplicate rows and the max/min does not.

    Test both versions in your actual code to determine which is better for your implementation.

    A very wise man told me long ago...

    "A Developer must not guess... a Developer must KNOW!" -- Sergiy Flocka - Circa 2007. 😀

    You just can't determine which piece of code will run faster by looking at the execution plan (I've been seriously bitten in the distant past by doing so) and, as you said, a test needs to be done.  Since we don't have the OP's data, let's make some.

    For those wanting to test for themselves, here's some code to create a larger table and the fnTally function can be found at the article from the last link in my signature line below.

    --===== Create and populate the test table
    DROP TABLE IF EXISTS #TestTable
    ;
    WITH cteID AS
    (
    SELECT ID=id.N
    ,RowMultiplier = ABS(CHECKSUM(NEWID())%3)+1
    FROM dbo.fnTally(1,1000000) id
    )
    SELECT id.ID
    ,Result = IIF(ABS(CHECKSUM(NEWID())%2) = 0, 'N','Y')
    INTO #TestTable
    FROM cteID id
    CROSS APPLY dbo.fnTally(1,RowMultiplier) rows
    OPTION (MAXDOP 1)
    ;
    GO
    -----------------------------------------------------------------------------------------------------------------------
    PRINT REPLICATE('=',119);
    RAISERROR('========== Sergiy''s code ==========',0,0) WITH NOWAIT;
    ;
    DECLARE @BitBucket INT;
    SET STATISTICS TIME,IO ON;
    SELECT @BitBucket = ID
    FROM #TestTable r
    GROUP BY ID
    HAVING MAX(result) <> MIN(result);
    SET STATISTICS TIME,IO OFF;
    GO
    -----------------------------------------------------------------------------------------------------------------------
    PRINT REPLICATE('=',119);
    RAISERROR('========== Jeff William''s code ==========',0,0) WITH NOWAIT;
    ;
    DECLARE @BitBucket INT;
    SET STATISTICS TIME,IO ON;
    Select @BitBucket = r.id
    From #TestTable r
    Group By
    r.id
    Having count_big(Distinct r.result) = 2;
    SET STATISTICS TIME,IO OFF;
    GO

    Run Results...

    (1999756 rows affected)
    =======================================================================================================================
    ========== Sergiy's code ==========
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#TestTable__________________________________________________________________________________________________________00000000094E'.
    Scan count 1, logical reads 4454, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 844 ms, elapsed time = 840 ms.
    =======================================================================================================================
    ========== Jeff William's code ==========
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table '#TestTable__________________________________________________________________________________________________________00000000094E'.
    Scan count 1, logical reads 4454, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:
    CPU time = 1235 ms, elapsed time = 1233 ms.

    For this test, Jeff Williams code runs about 46% slower.  Must be the DISTINCT Sergiy was talking about 😉

    To Jeff's point, even though we tested with large enough randomized data above, the OP still needs to test in their environment.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply