Group by question

Question

Group by question

rash3554

SSCommitted

Points: 1984
More actions
November 14, 2014 at 2:45 pm

#295104

Hi there
I have a table with 3 ID columns, Name, Address
Create table temp(ID1 varchar(10),ID2 varchar(10),ID3 varchar(10),ID4 varchar(10),Name,Address)
insert into temp(100,200,300,400,'Amy','2 Main St,Salem,MA')
insert into temp(110,203,310,400,'Barry','3 Power St,Salem,MA')
insert into temp(130,205,320,400,'Amy','2 Main St,Salem,MA')
insert into temp(140,206,310,400,'Binny','110 Elm St,Salem, MA')
insert into temp(150,205,360,400,'Dan','230 BEaver Rd, Salem, MA')
I want to insert into another table Test with same Columns but records that do not have same Name,Address Combination. I.e.I do not want record 1 and 3. How can I do it?
Thanks
R

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Xedni SSCertifiable Points: 6951 More actions · Answer 1

So what you're saying is if there are any duplicate records for name and address, those records should be excluded from your select statement.

What you want to do then is de-dupe your data. My first inclination would be to group the results by name and address with a HAVING clause to eliminate any groups with more than one record.

set nocount on

go

if object_id('tempdb.dbo.#temp') is not null drop table #temp

Create table #temp(ID1 varchar(10),ID2 varchar(10),ID3 varchar(10),ID4 varchar(10),Name varchar(30),Address varchar(30))

insert into #temp values(100,200,300,400,'Amy','2 Main St,Salem,MA')

insert into #temp values(110,203,310,400,'Barry','3 Power St,Salem,MA')

insert into #temp values(130,205,320,400,'Amy','2 Main St,Salem,MA')

insert into #temp values(140,206,310,400,'Binny','110 Elm St,Salem, MA')

insert into #temp values(150,205,360,400,'Dan','230 BEaver Rd, Salem, MA')

select *

from #temp a

inner join (select

Name,

Address

from #temp

group by Name, Address

having count(1) = 1) b

on a.Name = b.Name

and a.Address = b.Address

Executive Junior Cowboy Developer, Esq.[/url]

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 2

The following does it in a single scan rather than 2 and does not require a JOIN. And, no... it doesn't need a GROUP BY either. The COUNT(*) here is a Windowing Function thanks to OVER.

WITH

cteDeDupe AS

(

SELECT ID1,ID2,ID3,ID4,Name,Address,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

FROM #temp

)

SELECT ID1,ID2,ID3,ID4,Name,Address

FROM cteDeDupe

WHERE DupeCount = 1

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

rash3554 SSCommitted Points: 1984 More actions · Answer 3

Thanks for the reply. I had one more question. I added a field called 'Ignoreflag' to the table with default value= -1.

It is set to one if Name,Address is same. So i decided to put Update statement like this

WITH

cteDeDupe AS

(

SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

FROM #temp

)

Update cteDeDupe set ignoreflag=0

WHERE DupeCount = 1

;

I get a message that 3 rows are updated , but when I select from temp table, I do not see any records with ignoreflag=0. Can someone explain?

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 4

rash3554 (11/16/2014)
Thanks for the reply. I had one more question. I added a field called 'Ignoreflag' to the table with default value= -1.
It is set to one if Name,Address is same. So i decided to put Update statement like this
WITH
cteDeDupe AS
(
SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)
FROM #temp
)
Update cteDeDupe set ignoreflag=0
WHERE DupeCount = 1
;
I get a message that 3 rows are updated , but when I select from temp table, I do not see any records with ignoreflag=0. Can someone explain?

Although I've reformatted the code a bit, the code that follows is the same as your code. It would have been handy if you could have provided the updated test table/data but you're new, so I did it for you.

I don't know what you're doing wrong... maybe selecting from the wrong table or ??? The code works as written.

--===== Conditionally drop the test table to make reruns in SSMS easier

IF OBJECT_ID('tempdb..#temp') IS NOT NULL

DROP TABLE #temp

;

--===== Create the test table

CREATE TABLE #temp

(

ID1 VARCHAR(10)

,ID2 VARCHAR(10)

,ID3 VARCHAR(10)

,ID4 VARCHAR(10)

,Name VARCHAR(30)

,Address VARCHAR(30)

,IgnoreFlag SMALLINT DEFAULT (-1)

)

;

--===== Populate the table with the original test data

INSERT INTO #temp

(ID1,ID2,ID3,ID4,Name,Address)

SELECT 100,200,300,400,'Amy' ,'2 Main St,Salem,MA' UNION ALL

SELECT 110,203,310,400,'Barry','3 Power St,Salem,MA' UNION ALL

SELECT 130,205,320,400,'Amy' ,'2 Main St,Salem,MA' UNION ALL

SELECT 140,206,310,400,'Binny','110 Elm St,Salem, MA' UNION ALL

SELECT 150,205,360,400,'Dan' ,'230 BEaver Rd, Salem, MA'

;

--===== Mark non-duplicated rows with a "0"

WITH

cteDeDupe AS

(

SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag

,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

FROM #temp

)

UPDATE cteDeDupe

SET IgnoreFlag = 0

WHERE DupeCount = 1

;

--===== Display the results

SELECT *

FROM #temp

;

Here are the results from the code above.

ID1 ID2 ID3 ID4 Name Address IgnoreFlag

--- --- --- --- ----- ------------------------ ----------

100 200 300 400 Amy 2 Main St,Salem,MA -1

110 203 310 400 Barry 3 Power St,Salem,MA 0

130 205 320 400 Amy 2 Main St,Salem,MA -1

140 206 310 400 Binny 110 Elm St,Salem, MA 0

150 205 360 400 Dan 230 BEaver Rd, Salem, MA 0

Since you're doing a table scan anyway, I'd modify the code so that you don't have to rely on the default IgnoreFlag because new rows that cause dupes could certainly be inserted. Here's how I'd do the code.

--===== Update the IgnoreFlag to identify if there are Name/Address dupes or not

WITH

cteDeDupe AS

(

SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag

,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

FROM #temp

)

UPDATE cteDeDupe

SET IgnoreFlag = CASE WHEN DupeCount = 1 THEN 0 ELSE -1 END

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 5

As a bit of a sidebar, ID1 thru 4 should probably be INT rather than VARCHAR according to your test data. It'll save space, perhaps some performance, and save you a world of hurt in the future if they are, in fact, numeric-only IDs.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)