Group by question

  • Hi there

    I have a table with 3 ID columns, Name, Address

    Create table temp(ID1 varchar(10),ID2 varchar(10),ID3 varchar(10),ID4 varchar(10),Name,Address)

    insert into temp(100,200,300,400,'Amy','2 Main St,Salem,MA')

    insert into temp(110,203,310,400,'Barry','3 Power St,Salem,MA')

    insert into temp(130,205,320,400,'Amy','2 Main St,Salem,MA')

    insert into temp(140,206,310,400,'Binny','110 Elm St,Salem, MA')

    insert into temp(150,205,360,400,'Dan','230 BEaver Rd, Salem, MA')

    I want to insert into another table Test with same Columns but records that do not have same Name,Address Combination. I.e.I do not want record 1 and 3. How can I do it?

    Thanks

    R

  • So what you're saying is if there are any duplicate records for name and address, those records should be excluded from your select statement.

    What you want to do then is de-dupe your data. My first inclination would be to group the results by name and address with a HAVING clause to eliminate any groups with more than one record.

    set nocount on

    go

    if object_id('tempdb.dbo.#temp') is not null drop table #temp

    Create table #temp(ID1 varchar(10),ID2 varchar(10),ID3 varchar(10),ID4 varchar(10),Name varchar(30),Address varchar(30))

    insert into #temp values(100,200,300,400,'Amy','2 Main St,Salem,MA')

    insert into #temp values(110,203,310,400,'Barry','3 Power St,Salem,MA')

    insert into #temp values(130,205,320,400,'Amy','2 Main St,Salem,MA')

    insert into #temp values(140,206,310,400,'Binny','110 Elm St,Salem, MA')

    insert into #temp values(150,205,360,400,'Dan','230 BEaver Rd, Salem, MA')

    select *

    from #temp a

    inner join (select

    Name,

    Address

    from #temp

    group by Name, Address

    having count(1) = 1) b

    on a.Name = b.Name

    and a.Address = b.Address

    Executive Junior Cowboy Developer, Esq.[/url]

  • The following does it in a single scan rather than 2 and does not require a JOIN. And, no... it doesn't need a GROUP BY either. The COUNT(*) here is a Windowing Function thanks to OVER.

    WITH

    cteDeDupe AS

    (

    SELECT ID1,ID2,ID3,ID4,Name,Address,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

    FROM #temp

    )

    SELECT ID1,ID2,ID3,ID4,Name,Address

    FROM cteDeDupe

    WHERE DupeCount = 1

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the reply. I had one more question. I added a field called 'Ignoreflag' to the table with default value= -1.

    It is set to one if Name,Address is same. So i decided to put Update statement like this

    WITH

    cteDeDupe AS

    (

    SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

    FROM #temp

    )

    Update cteDeDupe set ignoreflag=0

    WHERE DupeCount = 1

    ;

    I get a message that 3 rows are updated , but when I select from temp table, I do not see any records with ignoreflag=0. Can someone explain?

  • rash3554 (11/16/2014)


    Thanks for the reply. I had one more question. I added a field called 'Ignoreflag' to the table with default value= -1.

    It is set to one if Name,Address is same. So i decided to put Update statement like this

    WITH

    cteDeDupe AS

    (

    SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

    FROM #temp

    )

    Update cteDeDupe set ignoreflag=0

    WHERE DupeCount = 1

    ;

    I get a message that 3 rows are updated , but when I select from temp table, I do not see any records with ignoreflag=0. Can someone explain?

    Although I've reformatted the code a bit, the code that follows is the same as your code. It would have been handy if you could have provided the updated test table/data but you're new, so I did it for you.

    I don't know what you're doing wrong... maybe selecting from the wrong table or ??? The code works as written.

    --===== Conditionally drop the test table to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL

    DROP TABLE #temp

    ;

    --===== Create the test table

    CREATE TABLE #temp

    (

    ID1 VARCHAR(10)

    ,ID2 VARCHAR(10)

    ,ID3 VARCHAR(10)

    ,ID4 VARCHAR(10)

    ,Name VARCHAR(30)

    ,Address VARCHAR(30)

    ,IgnoreFlag SMALLINT DEFAULT (-1)

    )

    ;

    --===== Populate the table with the original test data

    INSERT INTO #temp

    (ID1,ID2,ID3,ID4,Name,Address)

    SELECT 100,200,300,400,'Amy' ,'2 Main St,Salem,MA' UNION ALL

    SELECT 110,203,310,400,'Barry','3 Power St,Salem,MA' UNION ALL

    SELECT 130,205,320,400,'Amy' ,'2 Main St,Salem,MA' UNION ALL

    SELECT 140,206,310,400,'Binny','110 Elm St,Salem, MA' UNION ALL

    SELECT 150,205,360,400,'Dan' ,'230 BEaver Rd, Salem, MA'

    ;

    --===== Mark non-duplicated rows with a "0"

    WITH

    cteDeDupe AS

    (

    SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag

    ,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

    FROM #temp

    )

    UPDATE cteDeDupe

    SET IgnoreFlag = 0

    WHERE DupeCount = 1

    ;

    --===== Display the results

    SELECT *

    FROM #temp

    ;

    Here are the results from the code above.

    ID1 ID2 ID3 ID4 Name Address IgnoreFlag

    --- --- --- --- ----- ------------------------ ----------

    100 200 300 400 Amy 2 Main St,Salem,MA -1

    110 203 310 400 Barry 3 Power St,Salem,MA 0

    130 205 320 400 Amy 2 Main St,Salem,MA -1

    140 206 310 400 Binny 110 Elm St,Salem, MA 0

    150 205 360 400 Dan 230 BEaver Rd, Salem, MA 0

    Since you're doing a table scan anyway, I'd modify the code so that you don't have to rely on the default IgnoreFlag because new rows that cause dupes could certainly be inserted. Here's how I'd do the code.

    --===== Update the IgnoreFlag to identify if there are Name/Address dupes or not

    WITH

    cteDeDupe AS

    (

    SELECT ID1,ID2,ID3,ID4,Name,Address,IgnoreFlag

    ,DupeCount = COUNT(*) OVER (PARTITION BY Name, Address)

    FROM #temp

    )

    UPDATE cteDeDupe

    SET IgnoreFlag = CASE WHEN DupeCount = 1 THEN 0 ELSE -1 END

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a bit of a sidebar, ID1 thru 4 should probably be INT rather than VARCHAR according to your test data. It'll save space, perhaps some performance, and save you a world of hurt in the future if they are, in fact, numeric-only IDs.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply