Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Determine Primary and Secondary numbers in a table. Expand / Collapse
Author
Message
Posted Saturday, July 5, 2014 3:14 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 1:38 PM
Points: 445, Visits: 847
Query Help Please

I have included DDL scripts to facilitate the process.

I have a table called Duplicate_Cust_Numbers_to_DeDupe.

This table was created as a result of a script to detect duplicate Cust Numbers.


The following explains each column.

CustSGID is a system generated id for the customer when that customer is created in the database.

CUSTein is a taxid for that company, but the numbers you see have been made up and are not real and the column name is made up.

Cust_Name is the name of the Customer but again these are made up names and any similarity to real names is purely by accident.

Cust_Number this is a number provided by the outside source that is providing the data that is inserted into the database originally.
The Cust_Number is supposed to be unique by customer but as you can see that is not always the case. The outside source violates their own rule very often.

Has_Account means a user has come into the system and created an account for that customer.

Has_Cert this means a person has certified that the customer is who they say they are.


PrimaryCustid is the primary CustSGID that will be used as the final CustSGID once the de duplication process is complete.

SecondaryCustid is the CustID that will be blended into the primary CustSGID.


Goal

I need to populate the PrimaryCUSTID and the SecondaryCUSTID based on the criteria below, 1 being top and 3 being lowest criteria.

1.) Has an account

2.) Has a CustEIN

3.) Has a Cert.


In the data provided
In the actual table there are over 27,000 records.

The Customer Bissell inc is the PrimaryCustSGID because it has a CUSTEIN, no account and no cert.
The customer Bissell International becomes the SecondatyCustID


The Customer JAMES BLDG PRODUC is the PrimaryCustSGID because it has an account and there is no CustEIN or Cert.
The Customer JAMES BLDG PR becomes the SecondaryCustSGID because there is no account eventhough there is a CustEIN.

The Customer DOORCRAFT has no account, or CustEIN, or Cert so in this case the lowest CustSGID(399108) will become the PrimaryCustSGID
and the highest(1059533) will become the secondaryCustSGID


The Customer AMERDALE with the account becomes the primaryCUSTSGID and the second instance of AMERDALE becomes the secondaryCustID
even though both have a CUSTein. The instance with the Accounts wins in this case.


I have struggled over a week on this and have gotten nowhere, and now I think I am too close to the situation.
I need a different set of eyes or brains to look at this a show me how to do this.


I really appreciate the help I have received in the past.
This is a forum and the people who respond are a great resource.

Thanks in Advance

Gary

DDL

----CREATE TABLE [dbo].[Duplicate_Cust_Numbers_to_DeDupe](
----
---- [CustSGID] [int] NOT NULL,
---- [CUSTein] [varchar](25) NULL,
---- [Cust_Name] [varchar](70) NOT NULL,
---- [Cust_Number] [nvarchar](50) NOT NULL,
---- [Has_Account] [varchar](3) NOT NULL,
---- [Has_Cert] [varchar](3) NOT NULL,
----
----
---- [PrimaryCUSTID] [int] NULL,
---- [SecondaryCUSTID] [int] NULL,
----
----
----) ON [PRIMARY]
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(399011,'123456789','BISSELL INC','100140','No','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1060533,'','BISSELL Inernational','100140','No','No',0,0)
--
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(395524,'','JAMES BLDG PRODUC','12092','Yes','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1059069,'999999990','JAMES BLDG PR','12092','No','No',0,0)
--
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(399108,'','DOORCRAFT','12723','No','No',0,0)
--
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1059533,'','DOORCRAFT','12723','No','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(401860,'999999909','AMERDALE','14537','Yes','No',0,0)
--
--insert into [dbo].[Duplicate_Cust_Numbers_to_DeDupe](CustSGID,CUSTein,Cust_Name,Cust_Number,Has_Account,Has_Cert,PrimaryCUSTID,SecondaryCUSTID)
--values(1060335,'999999009','AMERDALE','14537','No','No',0,0)



Post #1589564
Posted Saturday, July 5, 2014 5:28 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 20,792, Visits: 32,703
And what code have you developed so far in an attempt to solve your problem?


Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1589569
Posted Saturday, July 5, 2014 6:06 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 20,792, Visits: 32,703
While waiting for what you tried I thought I'd give it a shot. Is this basically what you were looking for?


select
*,
rn = row_number() over (partition by Cust_Number order by
case when Has_Account = 'Yes' then 1
when CUSTein > '' then 2
when Has_Cert = 'Yes' then 3
else 4
end, CustSGID)-- as SortOrder
from
[dbo].[Duplicate_Cust_Numbers_to_DeDupe];

with dedup_id as (
select
*,
rn = row_number() over (partition by Cust_Number order by
case when Has_Account = 'Yes' then 1
when CUSTein > '' then 2
when Has_Cert = 'Yes' then 3
else 4
end, CustSGID)-- as SortOrder
from
[dbo].[Duplicate_Cust_Numbers_to_DeDupe]
)
select
Cust_Number,
max(case rn when 1 then Cust_Name else '' end) as Cust_Name,
max(Has_Account) as Has_Account,
max(Has_Cert) as Has_Cert,
max(case rn when 1 then CustSGID else '' end) as PrimaryCustSGID,
max(case rn when 2 then CustSGID else '' end) as SecondaryCustSGID
from
dedup_id
where
rn <= 2 -- Is it possible there may be more than 2 duplicate Cust_Number records?
group by
Cust_Number





Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1589573
Posted Saturday, July 5, 2014 6:15 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 20,792, Visits: 32,703
Updated the code above.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1589574
Posted Sunday, July 6, 2014 7:37 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 20,792, Visits: 32,703
Hopefully what I posted works for you, unfortunately I just noticed that you had posted in a SQL Server 7/2000 forum and the code I provided will work in SQL Server 2005 and later, not in SQL Server 2000.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1589643
Posted Tuesday, July 8, 2014 5:42 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 1:38 PM
Points: 445, Visits: 847
Thank you Lynn for your response.

I have been offline for a few days and have not been able to try your solution.

What I had been trying was using a series of case statements checking for each condition and that was working about as well as tits on a bore hog, hence my post.

Yes, this is on SQL 2000, but since you suggested CTEs maybe I can use your suggestion as a guide to put something together in SQL 2000.

Thanks Again.

Gary



Post #1590567
Posted Tuesday, July 8, 2014 6:08 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 20,792, Visits: 32,703
Using SQL Server 2000 will make things more difficult. You don't have access to the windowing function row_number() for one.

I will think if I can come up with another possible answer.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1590569
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse