"Duplicate" records - How to coalesce?

  • I'm not sure how/if I can solve this issue. Any help would be much appreciated!

    I ran into an old solution that was poorly designed and now I have to fix it. There are 2 rows in the table for each "record" and I'm not sure how I would go about generating output that would only show one unique row. Here's the data:

    AFFILIATIONIDBACKAFFILIDENTITYID

    QMTNAA0004LFQMTNAA0004LGCMTNAA000WNQ

    QMTNAA0004LGQMTNAA0004LFAMTNAA000K7Y

    The AFFILIATIONID is the unique id for the table and the BACKAFFILID is the reference to it's sibling. The ENTITYID is an Account or Contact in the database. What I need is something like this for output:

    NEWRECORDIDFROMRECORDIDTORECORDID

    0000000001CMTNAA000WNQAMTNAA000K7Y

    Thoughts?

  • Use a "self" join. In other words, join the table to itself.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Here's the test setup if someone else wants to pursue this.

    CREATE TABLE #a (AFFILIATIONID varchar(20),

    BACKAFFILID varchar(20),

    ENTITYID varchar(20))

    INSERT #a

    VALUES ('QMTNAA0004LF', 'QMTNAA0004LG', 'CMTNAA000WNQ')

    INSERT #a

    VALUES ('QMTNAA0004LG', 'QMTNAA0004LF', 'AMTNAA000K7Y')

    HOWEVER, there is a flaw because there is nothing (yet) that will declare that the LF record is supposed to be the FROM and the LG record is supposed to be the TO.

    Sturev, once you define that I have a solution for you. Without that you have an impossible request.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • When I join the table to itself, I end up with two rows since each row has it's sibling as the BACKAFFILID.

    AFFILIATION INNER JOIN AFFILIATION AS AFFILIATION_1 ON AFFILIATION.BACKAFFILID = AFFILIATION_1.AFFILIATIONID

    AFFILIATIONIDBACKAFFILIDENTITYID

    QMTNAA0004LFQMTNAA0004LGCMTNAA000WNQ

    QMTNAA0004LGQMTNAA0004LFAMTNAA000K7Y

    Unless I'm not using the right join?

  • Kevin,

    It doesn't matter which one of the ENTITYID's is the To or From. Maybe just use the 1st row as From if that works?

  • I can work with "it doesn't matter". Give me a few minutes to slap together some code.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Kludgy as heck, but it seems to give the desired results. :hehe:

    CREATE TABLE #a (AFFILIATIONID varchar(20),

    BACKAFFILID varchar(20),

    ENTITYID varchar(20))

    INSERT #a

    VALUES ('QMTNAA0004LF', 'QMTNAA0004LG', 'CMTNAA000WNQ')

    INSERT #a

    VALUES ('QMTNAA0004LG', 'QMTNAA0004LF', 'AMTNAA000K7Y')

    INSERT #a

    VALUES ('QMTNAA0004XX', 'QMTNAA0004YY', 'XXTNAA000WNQ')

    INSERT #a

    VALUES ('QMTNAA0004YY', 'QMTNAA0004XX', 'YYTNAA000K7Y')

    WITH t AS (

    SELECT ROW_NUMBER() OVER (ORDER BY a1.AFFILIATIONID, a1.BACKAFFILID) AS rownum, a1.AFFILIATIONID, a1.BACKAFFILID, a1.ENTITYID AS FROMRECORDID, a2.ENTITYID AS TORECORDID

    FROM #a a1 INNER JOIN #a a2 ON a1.AFFILIATIONID = a2.BACKAFFILID)

    SELECT rownum / 2 AS NEWRECORDID, AFFILIATIONID, BACKAFFILID, FROMRECORDID, TORECORDID

    FROM t

    WHERE rownum %2 = 0

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Quick suggestion

    😎

    Relies on the sum of the checksum of the identifiers so be aware that there is a change of getting a collision error

    USE tempdb;

    GO

    SET NOCOUNT ON;

    ;WITH SAMPLE_DATA(AFFILIATIONID,BACKAFFILID,ENTITYID) AS

    (

    SELECT AFFILIATIONID ,BACKAFFILID ,ENTITYID FROM

    (VALUES ('QMTNAA0004LF','QMTNAA0004LG','CMTNAA000WNQ')

    ,('QMTNAA0004LG','QMTNAA0004LF','AMTNAA000K7Y')

    ) AS X(AFFILIATIONID,BACKAFFILID,ENTITYID)

    )

    ,GROUPED_SET AS

    (

    SELECT

    DENSE_RANK() OVER

    (

    ORDER BY (

    CONVERT(BIGINT,CHECKSUM(SD.BACKAFFILID ),0)

    + CONVERT(BIGINT,CHECKSUM(SD.AFFILIATIONID),0)

    )

    ) AS GRP_ID

    ,SD.ENTITYID

    FROM SAMPLE_DATA SD

    )

    ,ORDERED_GROUP_SET AS

    (

    SELECT

    GS.GRP_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY GS.GRP_ID

    ORDER BY (SELECT NULL)

    ) AS ORDER_OF_APPEARANCE

    ,GS.ENTITYID

    FROM GROUPED_SET GS

    )

    SELECT

    OGS.GRP_ID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 1 THEN OGS.ENTITYID END) AS FROMRECORDID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 2 THEN OGS.ENTITYID END) AS TORECORDID

    FROM ORDERED_GROUP_SET OGS

    GROUP BY OGS.GRP_ID;

    Output

    GRP_ID FROMRECORDID TORECORDID

    ------- ------------ ------------

    1 AMTNAA000K7Y CMTNAA000WNQ

  • Wow Kevin, that's like light years beyond my sql skills, thanks a ton!

    So I have 4 other fields in mix that I need to bring along and your solution is so advanced, I'm not sure how/if it possible to incorporate them?

    Each row has a field named RELATION. So row 1 would have a value of "Board" and row 2 would have a value of "Board Member". Then each row has a Notes & Status field but those are duplicate values so I just need them from one of the two rows.

  • Sturev (5/25/2016)


    Wow Kevin, that's like light years beyond my sql skills, thanks a ton!

    So I have 4 other fields in mix that I need to bring along and your solution is so advanced, I'm not sure how/if it possible to incorporate them?

    Each row has a field named RELATION. So row 1 would have a value of "Board" and row 2 would have a value of "Board Member". Then each row has a Notes & Status field but those are duplicate values so I just need them from one of the two rows.

    You see how I set up the sample data? You do the work to set up the next set of sample data and what your expected output is. Be sure to include at least two full sets of output worth of sample data and also cover all possible permutations/exceptions/etc.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Sturev (5/25/2016)


    Wow Kevin, that's like light years beyond my sql skills, thanks a ton!

    You are very welcome. When you have 45000 hours working with SQL Server your brain will probably think like mine too. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Here is another approach. This has the advantage that it accounts for orphaned records.

    SELECT

    ROW_NUMBER() OVER(ORDER BY a1.AFFILIATIONID) NewRecordID,

    a1.ENTITYID AS FromRecordID,

    a2.ENTITYID ToRecordID

    FROM #a a1

    LEFT OUTER JOIN #a a2

    ON a1.AFFILIATIONID = a2.BACKAFFILID

    WHERE a1.AFFILIATIONID < a2.AFFILIATIONID

    OR a2.AFFILIATIONID IS NULL

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • TheSQLGuru (5/25/2016)

    You see how I set up the sample data? You do the work to set up the next set of sample data and what your expected output is. Be sure to include at least two full sets of output worth of sample data and also cover all possible permutations/exceptions/etc.

    My pleasure Kevin...

    CREATE TABLE #a (AFFILIATIONID varchar(20),

    BACKAFFILID varchar(20),

    ENTITYID varchar(20),

    RELATION varchar(64),

    NOTES varchar(128),

    STATUS varchar(32))

    INSERT #a

    VALUES ('QMTNAA000KMD', 'QMTNAA000KME', 'CMTNAA0015DQ', 'Mother-in-Law', 'In, Law', 'Active')

    INSERT #a

    VALUES ('QMTNAA000KME', 'QMTNAA000KMD', 'CMTNAA0012QT', 'Son-in-Law', 'In, Law', 'Active')

    INSERT #a

    VALUES ('QMTNAA000OH1', 'QMTNAA000OH2', 'AMTNAA000QGC', 'Client', 'Sunglasses', 'Active')

    INSERT #a

    VALUES ('QMTNAA000OH2', 'QMTNAA000OH1', 'AMTNAA000K3S', 'Vendor', 'Sunglasses', 'Active')

  • drew.allen (5/25/2016)


    Here is another approach. This has the advantage that it accounts for orphaned records.

    Drew

    Thanks Drew! In this case we don't want the orphans; I know, not ideal but it makes sense in this case.

  • TheSQLGuru (5/25/2016)You are very welcome. When you have 45000 hours working with SQL Server your brain will probably think like mine too. 😎

    I've been off in force.com land and haven't touched SQL in about 5 years, so needless to say the rust is showing! I'm going to send you a connection request on LinkedIn as there might be some work I can toss your way!

    Thanks again!

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply