Select group on multiple columns when at least one of the non grouped columns do not all match

  • I'd like to first figure out the count of how many rows are not the Current Edition have the following:

    Second I'd like to be able to select the primary key of all the rows involved

    Third I'd like to select all the primary keys of just the rows not in the current edition

    Not really sure how to describe this without making a dataset

    [Code="sql"]

    CREATE TABLE [Project].[TestTable1](

    [TestTable1_pk] [int] IDENTITY(1,1) NOT NULL,

    [Source_ID] [int] NOT NULL,

    [Edition_fk] [int] NOT NULL,

    [Key1_fk] [int] NOT NULL,

    [Key2_fk] [int] NOT NULL,

    [MatchTypeCode] [char](1) NOT NULL,

    [MatchScore] [int] NOT NULL,

    [MatchRank] [int] NOT NULL CONSTRAINT [DF_MidasMatchNormalized_MatchRank] DEFAULT ((0)),

    CONSTRAINT [PK_TestTable1] PRIMARY KEY CLUSTERED

    (

    [TestTable1_pk] ASC

    )

    );

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,1181206,'N',50,1);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,1181206,'N',50,1);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,1307967,'P',25,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,1307967,'P',25,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,1307967,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,1307967,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,1753914,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,1753914,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,3829330,'I',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,3829330,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,3829330,'P',25,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,3829330,'P',25,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,3829330,'I',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,3829330,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,3854067,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,3854067,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,133966931,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,133966931,'P',25,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,133966931,'I',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,133966931,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,133966931,'I',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,58,249848,134402313,'N',50,0);

    INSERT INTO [Project].[TestTable1] ([Source_ID], [Edition_fk], [Key1_fk], [Key2_fk], [MatchTypeCode], [MatchScore], [MatchRank])

    VALUES (403583201,54,249848,134402313,'N',50,0);

    [/code]

    So this is a small sample of the data that I randomly pulled that had the 403583201 Source_ID and I noticed that some have an Edition_fk of 58 but other than that they are the same.

    Group by fails me because I only want the groups where the Edition_fk don't match

    My head is spinning trying to figure these out!

  • Nice job posting DDL and sample data! How do you determine which rows are/are not the Current Edition? Once we have that, the rest of this should be trivial... So could you explain what the Edition column means? Without it I can't get from your question to your data to an answer, because I don't understand what I'm looking at.

    Thanks!

    Pieter

  • Hi and welcome to the forum, complements on the fine DDL and sample data!

    Not too clear on what you are after but please bear with me as I'm only on the first morning coffee:-D

    I put together a quick query bringing back some of the things I think you'll need using Window functions. Using the Window functions for this is quite neat as it allows for aggregation while keeping the details in the set.

    Note: I changed the schema to dbo.

    😎

    ;WITH BASE_DATA AS

    (

    SELECT

    TT.TestTable1_pk

    ,TT.Source_ID

    ,TT.Edition_fk

    ,TT.Key1_fk

    ,TT.Key2_fk

    ,TT.MatchTypeCode

    ,TT.MatchScore

    ,TT.MatchRank

    ,CASE

    WHEN DENSE_RANK() OVER (ORDER BY TT.Edition_fk DESC) = 1 THEN 0

    ELSE 1

    END AS OLD_EDITION

    ,ROW_NUMBER() OVER

    (

    PARTITION BY TT.Edition_fk

    ,TT.Source_ID

    ,TT.Key1_fk

    ,TT.Key2_fk

    ,TT.MatchTypeCode

    ,TT.MatchScore

    ,TT.MatchRank

    ORDER BY (SELECT NULL)

    ) AS EDITION_DDRID

    ,ROW_NUMBER() OVER

    (

    ORDER BY TT.Edition_fk

    ,TT.Source_ID

    ,TT.Key1_fk

    ,TT.Key2_fk

    ,TT.MatchTypeCode

    ,TT.MatchScore

    ,TT.MatchRank

    ) AS EDITION_RID

    FROM [dbo].[TestTable1] TT

    )

    SELECT

    BD.TestTable1_pk

    ,BD.Source_ID

    ,BD.Edition_fk

    ,BD.Key1_fk

    ,BD.Key2_fk

    ,BD.MatchTypeCode

    ,BD.MatchScore

    ,BD.MatchRank

    ,BD.OLD_EDITION

    ,SIGN(BD.EDITION_DDRID -1) AS IS_DUPE

    ,BD.EDITION_RID

    ,SUM(BD.OLD_EDITION) OVER

    (

    PARTITION BY BD.Source_ID

    ) AS NUM_OLD_ENTRIES_BY_SRC_ID

    ,SUM(BD.EDITION_DDRID -1) OVER

    (

    PARTITION BY BD.Source_ID,OLD_EDITION

    ) AS NUM_DUP_ENT_BY_SRC_CURR

    FROM BASE_DATA BD

    Results

    TestTable1_pk Source_ID Edition_fk Key1_fk Key2_fk MatchTypeCode MatchScore MatchRank OLD_EDITION IS_DUPE EDITION_RID NUM_OLD_ENTRIES_BY_SRC_ID NUM_DUP_ENT_BY_SRC_CURR

    ------------- ----------- ----------- ----------- ----------- ------------- ----------- ----------- ----------- -------------------- -------------------- ------------------------- -----------------------

    2 403583201 58 249848 1181206 N 50 1 0 0 13 12 1

    6 403583201 58 249848 1307967 N 50 0 0 0 14 12 1

    3 403583201 58 249848 1307967 P 25 0 0 0 15 12 1

    4 403583201 58 249848 1307967 P 25 0 0 1 16 12 1

    7 403583201 58 249848 1753914 N 50 0 0 0 17 12 1

    13 403583201 58 249848 3829330 I 50 0 0 0 18 12 1

    10 403583201 58 249848 3829330 N 50 0 0 0 19 12 1

    15 403583201 58 249848 3854067 N 50 0 0 0 20 12 1

    21 403583201 58 249848 133966931 I 50 0 0 0 21 12 1

    20 403583201 58 249848 133966931 N 50 0 0 0 22 12 1

    22 403583201 58 249848 134402313 N 50 0 0 0 23 12 1

    1 403583201 54 249848 1181206 N 50 1 1 0 1 12 1

    5 403583201 54 249848 1307967 N 50 0 1 0 2 12 1

    8 403583201 54 249848 1753914 N 50 0 1 0 3 12 1

    9 403583201 54 249848 3829330 I 50 0 1 0 4 12 1

    14 403583201 54 249848 3829330 N 50 0 1 0 5 12 1

    11 403583201 54 249848 3829330 P 25 0 1 0 6 12 1

    12 403583201 54 249848 3829330 P 25 0 1 1 7 12 1

    16 403583201 54 249848 3854067 N 50 0 1 0 8 12 1

    19 403583201 54 249848 133966931 I 50 0 1 0 9 12 1

    17 403583201 54 249848 133966931 N 50 0 1 0 10 12 1

    18 403583201 54 249848 133966931 P 25 0 1 0 11 12 1

    23 403583201 54 249848 134402313 N 50 0 1 0 12 12 1

  • pietlinden (8/27/2014)


    Nice job posting DDL and sample data! How do you determine which rows are/are not the Current Edition? Once we have that, the rest of this should be trivial... So could you explain what the Edition column means? Without it I can't get from your question to your data to an answer, because I don't understand what I'm looking at.

    Thanks!

    Pieter

    Realized that I didn't put what current Edition_fk should be until after I crawled into bed last night. The current Edition_fk is 58. It basically tells me with what data set the matches were created.

    Jeff

  • Eirikur Eiriksson (8/27/2014)


    Hi and welcome to the forum, complements on the fine DDL and sample data!

    Not too clear on what you are after but please bear with me as I'm only on the first morning coffee:-D

    I put together a quick query bringing back some of the things I think you'll need using Window functions. Using the Window functions for this is quite neat as it allows for aggregation while keeping the details in the set.

    Note: I changed the schema to dbo.

    I'm also still in the process of waking up and am still trying to figure out what cool tricks are doing, so I'm not sure if this accomplishes what I need, but I intend to understand it because I like learning! After I test with some more data I'll reply back if this is what I need.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply