Home Forums SQL Server 2008 T-SQL (SS2K8) De-duplicating Customer Detail and assigning parent child structure to multiple customer records RE: De-duplicating Customer Detail and assigning parent child structure to multiple customer records

  • This will be potentially a nightly run that will update the customer details. This table will then have the various passes ran against it to identify changes in the relationships between the master and its children. This could mean customerID's being added, them being removed or changes in which is the primary.

    It is not the identification of the duplicates, or even the identification of the most recent it is when you try and identify all the children below the primary.

    If you run the below piece of code then you can see a simplified example (using a horrible correlated sub query) that you are going to be left with two pairs of interchanged ID's.

    this will leave us once a distinct is applied with this situation :

    Master Child

    7320 25861

    7321 7320

    how would you go about linking these customers and allowing for all different variations of this problem that will arise. Some customers have used over 19 different credit cards... at the moment we are looking at updates in a big loop but this is going to have to loop through EVERY customer for EVERY credit card which is going to be a massive hit everyday. I was interested to see if anyone had any better suggestions or solutions to a similar problem.

    Thanks

    Matt

    with rawdata as

    (

    select '20090503' as dateid

    ,7320 as CustID

    ,'370090510' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20090420' as dateid

    ,25861 as CustID

    ,'370090510' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100404' as dateid

    ,7320 as CustID

    ,'370080987' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20090703' as dateid

    ,25861 as CustID

    ,'370080987' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100603' as dateid

    ,7320 as CustID

    ,'594220191' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100705' as dateid

    ,25861 as CustID

    ,'594220191' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100719' as dateid

    ,7320 as CustID

    ,'594220654' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100720' as dateid

    ,7321 as CustID

    ,'594220654' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    order by custid

    )

    ,

    rawdata2 as

    (

    select custid,MAX(dateid) as maxdateid

    from rawdata

    group by custid

    )

    SELECT distinct

    CASE WHEN CustDate1 > CustDate2 THEN CustID1 ELSE CustID2 END AS [GrandMasterCustomer]

    ,CASE WHEN CustDate1 > CustDate2 THEN CustDate1 ELSE CustDate2 END AS [GrandMasterDate]

    ,CASE WHEN CustDate1 > CustDate2 THEN CustID2 ELSE CustID1 END AS [SubordinateCustomer]

    FROM

    (

    select

    r1.CustID as CustID1

    , (SELECT MAX(d1.dateid) FROM rawdata AS d1 WHERE d1.CustID = r1.CustID) AS CustDate1

    ,r2.CustID as CustID2

    , (SELECT MAX(d2.dateid) FROM rawdata AS d2 WHERE d2.CustID = r2.CustID) AS CustDate2

    from rawdata r1

    join rawdata r2

    on r1.CCNumber = r2.CCNumber

    where r1.CustID<>r2.CustID

    ) AS a