Home Forums SQL Server 2008 T-SQL (SS2K8) De-duplicating Customer Detail and assigning parent child structure to multiple customer records RE: De-duplicating Customer Detail and assigning parent child structure to multiple customer records

  • Hi Chris,

    Thanks for taking the time to respond.

    I am indeed trying to dedupe a customer list so that i can have a master customer that is linked to all the other historical instances of that customer on our system. i know, seems like such a regular and simple concept...very difficult!! :crazy:

    The customer detail can vary through all the usual detail that is held. email, home address detail, first name and last name but you cannot guarantee that any of this information on its own is going to guarantee a match. i.e. just because the first name last name matches another record with a different customer id, doesn't mean its the same person.

    Therefore we are splitting it down into passes with each pass having a degree of confidence assigned to it. The sample data that I have included is credit card detail (not real detail obviously!!!) that has been concatenated together to make a unique identity field. This will be the same process used when comparing other data i.e. telephone number will be concatenated with first_name last_name to create a unique identity field that will then be used to match against all other records.

    Assume that this data all has the same name attached to it as this level of matching is the easy bit. (code below)

    on this pass I am looking initially at a section of around 1million customers and around the same number of transactions to search through to identify who has used which CC detail. We would then potentially be looking to open this out to the entire customer base of which i haven't got a number. If we can find a structure that work then we can worry about runtime, processing and optimization at a later date.

    Again, many thanks for your help. Let me know if there is anything else that you need.

    Thanks

    Matt

    select '20090503' as dateid

    ,7320 as CustID

    ,'370090510' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20090420' as dateid

    ,25861 as CustID

    ,'370090510' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100404' as dateid

    ,7320 as CustID

    ,'370080987' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20090703' as dateid

    ,25861 as CustID

    ,'370080987' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100603' as dateid

    ,7320 as CustID

    ,'594220191' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100705' as dateid

    ,25861 as CustID

    ,'594220191' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100719' as dateid

    ,7320 as CustID

    ,'594220654' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name

    union

    select '20100720' as dateid

    ,7321 as CustID

    ,'594220654' as CCNumber

    ,'Jane' as First_Name

    ,'Smith' as Last_Name