November 14, 2014 at 2:45 pm
Hi there
I have a table with 3 ID columns, Name, Address
Create table temp(ID1 varchar(10),ID2 varchar(10),ID3 varchar(10),ID4 varchar(10),Name,Address)
insert into temp(100,200,300,400,'Amy','2 Main St,Salem,MA')
insert into temp(110,203,310,400,'Barry','3 Power St,Salem,MA')
insert into temp(130,205,320,400,'Amy','2 Main St,Salem,MA')
insert into temp(140,206,310,400,'Binny','110 Elm St,Salem, MA')
insert into temp(150,205,360,400,'Dan','230 BEaver Rd, Salem, MA')
I want to insert into another table Test with same Columns but records that do not have same Name,Address Combination. I.e.I do not want record 1 and 3. How can I do it?
Thanks
R
November 14, 2014 at 3:03 pm
So what you're saying is if there are any duplicate records for name and address, those records should be excluded from your select statement.
What you want to do then is de-dupe your data. My first inclination would be to group the results by name and address with a HAVING clause to eliminate any groups with more than one record.
set nocount on
go
if object_id('tempdb.dbo.#temp') is not null drop table #temp
Create table #temp(ID1 varchar(10),ID2 varchar(10),ID3 varchar(10),ID4 varchar(10),Name varchar(30),Address varchar(30))
insert into #temp values(100,200,300,400,'Amy','2 Main St,Salem,MA')
insert into #temp values(110,203,310,400,'Barry','3 Power St,Salem,MA')
insert into #temp values(130,205,320,400,'Amy','2 Main St,Salem,MA')
insert into #temp values(140,206,310,400,'Binny','110 Elm St,Salem, MA')
insert into #temp values(150,205,360,400,'Dan','230 BEaver Rd, Salem, MA')
select *
from #temp a
inner join (select
Name,
Address
from #temp
group by Name, Address
having count(1) = 1) b
on a.Name = b.Name
and a.Address = b.Address
November 14, 2014 at 5:26 pm
The following does it in a single scan rather than 2 and does not require a JOIN. And, no... it doesn't need a GROUP BY either. The COUNT(*) here is a Windowing Function thanks to OVER.
WITH
cteDeDupe AS
(
SELECT ID1,ID2,ID3,ID4,Name,Address,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)
FROM #temp
)
SELECT ID1,ID2,ID3,ID4,Name,Address
FROM cteDeDupe
WHERE DupeCount = 1
;
--Jeff Moden
Change is inevitable... Change for the better is not.
November 16, 2014 at 5:06 am
Thanks for the reply. I had one more question. I added a field called 'Ignoreflag' to the table with default value= -1.
It is set to one if Name,Address is same. So i decided to put Update statement like this
WITH
cteDeDupe AS
(
SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)
FROM #temp
)
Update cteDeDupe set ignoreflag=0
WHERE DupeCount = 1
;
I get a message that 3 rows are updated , but when I select from temp table, I do not see any records with ignoreflag=0. Can someone explain?
November 16, 2014 at 4:46 pm
rash3554 (11/16/2014)
Thanks for the reply. I had one more question. I added a field called 'Ignoreflag' to the table with default value= -1.It is set to one if Name,Address is same. So i decided to put Update statement like this
WITH
cteDeDupe AS
(
SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)
FROM #temp
)
Update cteDeDupe set ignoreflag=0
WHERE DupeCount = 1
;
I get a message that 3 rows are updated , but when I select from temp table, I do not see any records with ignoreflag=0. Can someone explain?
Although I've reformatted the code a bit, the code that follows is the same as your code. It would have been handy if you could have provided the updated test table/data but you're new, so I did it for you.
I don't know what you're doing wrong... maybe selecting from the wrong table or ??? The code works as written.
--===== Conditionally drop the test table to make reruns in SSMS easier
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp
;
--===== Create the test table
CREATE TABLE #temp
(
ID1 VARCHAR(10)
,ID2 VARCHAR(10)
,ID3 VARCHAR(10)
,ID4 VARCHAR(10)
,Name VARCHAR(30)
,Address VARCHAR(30)
,IgnoreFlag SMALLINT DEFAULT (-1)
)
;
--===== Populate the table with the original test data
INSERT INTO #temp
(ID1,ID2,ID3,ID4,Name,Address)
SELECT 100,200,300,400,'Amy' ,'2 Main St,Salem,MA' UNION ALL
SELECT 110,203,310,400,'Barry','3 Power St,Salem,MA' UNION ALL
SELECT 130,205,320,400,'Amy' ,'2 Main St,Salem,MA' UNION ALL
SELECT 140,206,310,400,'Binny','110 Elm St,Salem, MA' UNION ALL
SELECT 150,205,360,400,'Dan' ,'230 BEaver Rd, Salem, MA'
;
--===== Mark non-duplicated rows with a "0"
WITH
cteDeDupe AS
(
SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag
,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)
FROM #temp
)
UPDATE cteDeDupe
SET IgnoreFlag = 0
WHERE DupeCount = 1
;
--===== Display the results
SELECT *
FROM #temp
;
Here are the results from the code above.
ID1 ID2 ID3 ID4 Name Address IgnoreFlag
--- --- --- --- ----- ------------------------ ----------
100 200 300 400 Amy 2 Main St,Salem,MA -1
110 203 310 400 Barry 3 Power St,Salem,MA 0
130 205 320 400 Amy 2 Main St,Salem,MA -1
140 206 310 400 Binny 110 Elm St,Salem, MA 0
150 205 360 400 Dan 230 BEaver Rd, Salem, MA 0
Since you're doing a table scan anyway, I'd modify the code so that you don't have to rely on the default IgnoreFlag because new rows that cause dupes could certainly be inserted. Here's how I'd do the code.
--===== Update the IgnoreFlag to identify if there are Name/Address dupes or not
WITH
cteDeDupe AS
(
SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag
,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)
FROM #temp
)
UPDATE cteDeDupe
SET IgnoreFlag = CASE WHEN DupeCount = 1 THEN 0 ELSE -1 END
;
--Jeff Moden
Change is inevitable... Change for the better is not.
November 16, 2014 at 4:51 pm
As a bit of a sidebar, ID1 thru 4 should probably be INT rather than VARCHAR according to your test data. It'll save space, perhaps some performance, and save you a world of hurt in the future if they are, in fact, numeric-only IDs.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply