Determine Primary and Secondary numbers in a table.

  • Query Help Please

    I have included DDL scripts to facilitate the process.

    I have a table called Duplicate_Cust_Numbers_to_DeDupe.

    This table was created as a result of a script to detect duplicate Cust Numbers.

    The following explains each column.

    CustSGID is a system generated id for the customer when that customer is created in the database.

    CUSTein is a taxid for that company, but the numbers you see have been made up and are not real and the column name is made up.

    Cust_Name is the name of the Customer but again these are made up names and any similarity to real names is purely by accident.

    Cust_Number this is a number provided by the outside source that is providing the data that is inserted into the database originally.

    The Cust_Number is supposed to be unique by customer but as you can see that is not always the case. The outside source violates their own rule very often.

    Has_Account means a user has come into the system and created an account for that customer.

    Has_Cert this means a person has certified that the customer is who they say they are.

    PrimaryCustid is the primary CustSGID that will be used as the final CustSGID once the de duplication process is complete.

    SecondaryCustid is the CustID that will be blended into the primary CustSGID.

    Goal

    I need to populate the PrimaryCUSTID and the SecondaryCUSTID based on the criteria below, 1 being top and 3 being lowest criteria.

    1.) Has an account

    2.) Has a CustEIN

    3.) Has a Cert.

    In the data provided

    In the actual table there are over 27,000 records.

    The Customer Bissell inc is the PrimaryCustSGID because it has a CUSTEIN, no account and no cert.

    The customer Bissell International becomes the SecondatyCustID

    The Customer JAMES BLDG PRODUC is the PrimaryCustSGID because it has an account and there is no CustEIN or Cert.

    The Customer JAMES BLDG PR becomes the SecondaryCustSGID because there is no account eventhough there is a CustEIN.

    The Customer DOORCRAFT has no account, or CustEIN, or Cert so in this case the lowest CustSGID(399108) will become the PrimaryCustSGID

    and the highest(1059533) will become the secondaryCustSGID

    The Customer AMERDALE with the account becomes the primaryCUSTSGID and the second instance of AMERDALE becomes the secondaryCustID

    even though both have a CUSTein. The instance with the Accounts wins in this case.

    I have struggled over a week on this and have gotten nowhere, and now I think I am too close to the situation.

    I need a different set of eyes or brains to look at this a show me how to do this.

    I really appreciate the help I have received in the past.

    This is a forum and the people who respond are a great resource.

    Thanks in Advance

    Gary

    DDL

    ----CREATE TABLE [dbo].[Duplicate_Cust_Numbers_to_DeDupe](

    ----

    ----[CustSGID] [int] NOT NULL,

    ----[CUSTein] [varchar](25) NULL,

    ----[Cust_Name] [varchar](70) NOT NULL,

    ----[Cust_Number] [nvarchar](50) NOT NULL,

    ----[Has_Account] [varchar](3) NOT NULL,

    ----[Has_Cert] [varchar](3) NOT NULL,

    ----

    ----

    ----[PrimaryCUSTID] [int] NULL,

    ----[SecondaryCUSTID] [int] NULL,

    ----

    ----

    ----) ON [PRIMARY]

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(399011,'123456789','BISSELL INC','100140','No','No',0,0)

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(1060533,'','BISSELL Inernational','100140','No','No',0,0)

    --

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(395524,'','JAMES BLDG PRODUC','12092','Yes','No',0,0)

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(1059069,'999999990','JAMES BLDG PR','12092','No','No',0,0)

    --

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(399108,'','DOORCRAFT','12723','No','No',0,0)

    --

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(1059533,'','DOORCRAFT','12723','No','No',0,0)

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(401860,'999999909','AMERDALE','14537','Yes','No',0,0)

    --

    --insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)

    --values(1060335,'999999009','AMERDALE','14537','No','No',0,0)

  • And what code have you developed so far in an attempt to solve your problem?

  • While waiting for what you tried I thought I'd give it a shot. Is this basically what you were looking for?

    select

    *,

    rn = row_number() over (partition by Cust_Number order by

    case when Has_Account = 'Yes' then 1

    when CUSTein > '' then 2

    when Has_Cert = 'Yes' then 3

    else 4

    end, CustSGID)-- as SortOrder

    from

    [dbo].[Duplicate_Cust_Numbers_to_DeDupe];

    with dedup_id as (

    select

    *,

    rn = row_number() over (partition by Cust_Number order by

    case when Has_Account = 'Yes' then 1

    when CUSTein > '' then 2

    when Has_Cert = 'Yes' then 3

    else 4

    end, CustSGID)-- as SortOrder

    from

    [dbo].[Duplicate_Cust_Numbers_to_DeDupe]

    )

    select

    Cust_Number,

    max(case rn when 1 then Cust_Name else '' end) as Cust_Name,

    max(Has_Account) as Has_Account,

    max(Has_Cert) as Has_Cert,

    max(case rn when 1 then CustSGID else '' end) as PrimaryCustSGID,

    max(case rn when 2 then CustSGID else '' end) as SecondaryCustSGID

    from

    dedup_id

    where

    rn <= 2 -- Is it possible there may be more than 2 duplicate Cust_Number records?

    group by

    Cust_Number

  • Updated the code above.

  • Hopefully what I posted works for you, unfortunately I just noticed that you had posted in a SQL Server 7/2000 forum and the code I provided will work in SQL Server 2005 and later, not in SQL Server 2000.

  • Thank you Lynn for your response.

    I have been offline for a few days and have not been able to try your solution.

    What I had been trying was using a series of case statements checking for each condition and that was working about as well as tits on a bore hog, hence my post.

    Yes, this is on SQL 2000, but since you suggested CTEs maybe I can use your suggestion as a guide to put something together in SQL 2000.

    Thanks Again.

    Gary

  • Using SQL Server 2000 will make things more difficult. You don't have access to the windowing function row_number() for one.

    I will think if I can come up with another possible answer.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply