Selecting a row per group based on multiple conditions

  • I have a table with sample data as shown below. For each ProcessId, I need one row to be selected based on below business rules (in the order specified):
    1. When a processId is repeated, then select the row based on SubType='TopCon'
         1. a. When SubType='TopCon' have more than one record per processid, then select the row with lowest FormId
    2. When a processId is repeated and SubType <>'TopCon', then select the row with lowest FormId


    DECLARE @TABLE AS TABLE
    (
        ProcessId SMALLINT NOT NULL,
        FormKey VARCHAR(10) NULL,
        SubType VARCHAR(10) NOT NULL,
        FormId TINYINT NOT NULL
    )
    INSERT @TABLE
    (
      ProcessId,
      FormKey,
      SubType,
      FormId
    )
    VALUES
    ( 2222, 'ABC', 'TOPCON', 5 ), ( 2222, 'BBC', 'NT', 3 ),(2222, NULL, 'NT', 1 ),( 3333, 'BBC', 'TOPCON', 6 ),(3333, 'CBC', 'TOPCON', 3 ),( 3333, NULL, 'NT', 1 )
    ,(4444 , 'DBC', 'NT', 1 ),( 4444, 'EBC', 'NT', 5 )

    SELECT * FROM @TABLE

    Volume of real data is in 100s.

    Output should be like this:
    2222    ABC    TOPCON    2
    3333    CBC    TOPCON    3
    4444    DBC     NT               1

    Can you please help me write a query to achieve this?

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    Awesome!! Case Statement in Order by did the trick. Thanks Drew

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    Drew,
    What is the easy way in case I want to get all rows other than the rows matching conditions(like below rows)?  In other words, rows violating the rules to be displayed

    2222    BBC    NT    3
    2222    NULL    NT    1
    3333    BBC    TOPCON    6
    3333    NULL    NT    1
    4444    EBC    NT    5

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    What does 1 and 2 mean in Order By Case statement? Does 1 and 2 refer columns from @Table ?

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • Naveen PK - Thursday, February 22, 2018 12:59 PM

    drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    Drew,
    What is the easy way in case I want to get all rows other than the rows matching conditions(like below rows)?  In other words, rows violating the rules to be displayed

    2222    BBC    NT    3
    2222    NULL    NT    1
    3333    BBC    TOPCON    6
    3333    NULL    NT    1
    4444    EBC    NT    5

    Apparently, it's asking someone else to do your work for you instead of thinking for yourself.  The answer is obvious.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Naveen PK - Thursday, February 22, 2018 1:04 PM

    drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    What does 1 and 2 mean in Order By Case statement? Does 1 and 2 refer columns from @Table ?

    Test it out for yourself.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, February 22, 2018 1:13 PM

    Naveen PK - Thursday, February 22, 2018 12:59 PM

    drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    Drew,
    What is the easy way in case I want to get all rows other than the rows matching conditions(like below rows)?  In other words, rows violating the rules to be displayed

    2222    BBC    NT    3
    2222    NULL    NT    1
    3333    BBC    TOPCON    6
    3333    NULL    NT    1
    4444    EBC    NT    5

    Apparently, it's asking someone else to do your work for you instead of thinking for yourself.  The answer is obvious.

    Drew

    🙂
    I got it by  selecting rn<>1

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • drew.allen - Thursday, February 22, 2018 1:14 PM

    Naveen PK - Thursday, February 22, 2018 1:04 PM

    drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    What does 1 and 2 mean in Order By Case statement? Does 1 and 2 refer columns from @Table ?

    Test it out for yourself.

    Drew

    I know that integers in order by refers to column names in select clause. In this case, it works even when I change integers like this: ORDER BY CASE WHEN SubType = 'TopCon' THEN 10 ELSE 20 END, FormID.
    It all works as longs as Else part has an integer greater than THEN part integers.
    Some websites tell that using integers in order by clause is not a good practice.
    For me, I feel like 1 refers to SubType and 2 refers to FormId with Order By Clause but wondering why it works same when numbers changed.
    Can you please throw some light on this point?

    Thanks,
    Naveen.
    Every thought is a cause and every condition an effect

  • Naveen PK - Thursday, February 22, 2018 2:33 PM

    drew.allen - Thursday, February 22, 2018 1:14 PM

    Naveen PK - Thursday, February 22, 2018 1:04 PM

    drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    What does 1 and 2 mean in Order By Case statement? Does 1 and 2 refer columns from @Table ?

    Test it out for yourself.

    Drew

    I know that integers in order by refers to column names in select clause. In this case, it works even when I change integers like this: ORDER BY CASE WHEN SubType = 'TopCon' THEN 10 ELSE 20 END, FormID.
    It all works as longs as Else part has an integer greater than THEN part integers.
    Some websites tell that using integers in order by clause is not a good practice.
    For me, I feel like 1 refers to SubType and 2 refers to FormId with Order By Clause but wondering why it works same when numbers changed.
    Can you please throw some light on this point?

    The integers in the order by do NOT refer to column names in the ORDER BY clause of the ROW_NUMBER() OVER function, they are constants, and yes the order of the values is important to achieve what you are trying to accomplish.

  • Naveen PK - Thursday, February 22, 2018 2:33 PM

    drew.allen - Thursday, February 22, 2018 1:14 PM

    Naveen PK - Thursday, February 22, 2018 1:04 PM

    drew.allen - Thursday, February 22, 2018 12:33 PM

    The standard way to do this is with a CTE/ROW_NUMBER() combination.  You just have to get the correct order for the row number.

    ;
    WITH CTE AS
    (
        SELECT *, ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 1 ELSE 2 END, FormID) AS rn
        FROM @TABLE
    )
    SELECT ProcessId, FormKey, SubType, FormID
    FROM CTE
    WHERE rn = 1

    Drew

    What does 1 and 2 mean in Order By Case statement? Does 1 and 2 refer columns from @Table ?

    Test it out for yourself.

    Drew

    I know that integers in order by refers to column names in select clause. In this case, it works even when I change integers like this: ORDER BY CASE WHEN SubType = 'TopCon' THEN 10 ELSE 20 END, FormID.
    It all works as longs as Else part has an integer greater than THEN part integers.
    Some websites tell that using integers in order by clause is not a good practice.
    For me, I feel like 1 refers to SubType and 2 refers to FormId with Order By Clause but wondering why it works same when numbers changed.
    Can you please throw some light on this point?

    First, the ORDER BY in an OVER clause is different from the ORDER BY clause in a SELECT statement.  Here are some quotes that show the differences:

    ORDER BY Clause

    order_by_expression
    Specifies a column or expression on which to sort the query result set. A sort column can be specified as a name or column alias, or a nonnegative integer representing the position of the column in the select list.

    OVER Clause

    order_by_expression
    Specifies a column or expression on which to sort. order_by_expression can only refer to columns made available by the FROM clause. An integer cannot be specified to represent a column name or alias.

    Note the bolded section of the second quote.

    Second, you're confusing the argument with the value of the argument.  The argument here is the CASE expression, not the nonnegative integers 1 or 2.  The only purpose of the integers is to hard-code a specific value for the CASE expression which is used for the sort.  It doesn't matter what values are used as long as they define a specific sort.  I could use bits, characters, dates, times, binary.  I usually use integers, they're the most commonly associated with a particular order

    Here are some alternate versions:
    SELECT *
    , ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN CAST(0 AS BIT) ELSE 1 END, FormID) AS rn_bit
    , ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 'A' ELSE 'B' END, FormID) AS rn_char
    , ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN '1900-01-01' ELSE '2000-01-01' END, FormID) AS rn_dt
    , ROW_NUMBER() OVER(PARTITION BY ProcessID ORDER BY CASE WHEN SubType = 'TopCon' THEN 0xCC ELSE 0xFF END, FormID) AS rn_binary
    FROM @TABLE

    Also, remember that a CASE expression MUST return values of compatible types, so the output of CASE expression CANNOT refer to different columns, because there is no way to guarantee that the columns will be of compatible types.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply