Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

How to delete duplicates/triplicates etc. except for the latest record based on date? Expand / Collapse
Author
Message
Posted Wednesday, June 12, 2013 6:24 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, August 21, 2014 9:20 AM
Points: 16, Visits: 47
Hi,

I have a table (t_account_code) with the following three columns - AccountCode, StartDate and EndDate. The table does not have an ID column (I know - it was inherited) and at this point, modifying the table is not an option, so please bear with me. With the combination of three columns, we get a PK.

This table has multiple entries for the same AccountCode, and I want to delete all duplicate/triplicate/multiple entries EXCEPT for the latest one, based on StartDate.

First, I've created a temp table with all account codes (unique) and their count, if greater than 1.

I've tried the following query, first to do a select on the accounts that occur twice:

select * from t_account_code tac1
where AccountCode in (select AccountCode from #Tmp_accountcodes where count_num = 2)
and StartDate = (select MIN(StartDate) from t_account_code tac2 where tac2.AccountCode = tac1.AccountCode)

The distinct values of count_num are 2, 3 and 4.

This works as intended and gives me the results I want (for duplicate values), but when I change it to a delete statement, I get a syntax error in the first row for the table alias - tac1. Of course, if I delete it, then my query won't work as I'm using tac1 in the sub-query.

Any solution to this problem will be appreciated, even if it's a different query than the one above. I want to end up with unique AccountCodes in the table, with the latest StartDate, and delete any multiple ones with the older StartDate.

E.g.
AccountCode ---- StartDate ---- EndDate
1234------------ 12/31/2012 ---- 3/30/2013
1234------------ 3/31/2013 ------4/29/2013
1234------------ 4/30/2013 -----12/31/2014
9876------------12/31/2012------3/30/2013
9876------------3/31/2013 -------5/30/2014

I want to delete the first two rows for AccountCode '1234' and delete the first row for AccountCode '9876.'

If an AccountCode occurs only once, then I want to leave it alone.

Thanks.
Post #1462871
Posted Wednesday, June 12, 2013 7:27 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 2:53 PM
Points: 20,704, Visits: 32,350
SQL_beginner1 (6/12/2013)
Hi,

I have a table (t_account_code) with the following three columns - AccountCode, StartDate and EndDate. The table does not have an ID column (I know - it was inherited) and at this point, modifying the table is not an option, so please bear with me. With the combination of three columns, we get a PK.

This table has multiple entries for the same AccountCode, and I want to delete all duplicate/triplicate/multiple entries EXCEPT for the latest one, based on StartDate.

First, I've created a temp table with all account codes (unique) and their count, if greater than 1.

I've tried the following query, first to do a select on the accounts that occur twice:

select * from t_account_code tac1
where AccountCode in (select AccountCode from #Tmp_accountcodes where count_num = 2)
and StartDate = (select MIN(StartDate) from t_account_code tac2 where tac2.AccountCode = tac1.AccountCode)

The distinct values of count_num are 2, 3 and 4.

This works as intended and gives me the results I want (for duplicate values), but when I change it to a delete statement, I get a syntax error in the first row for the table alias - tac1. Of course, if I delete it, then my query won't work as I'm using tac1 in the sub-query.

Any solution to this problem will be appreciated, even if it's a different query than the one above. I want to end up with unique AccountCodes in the table, with the latest StartDate, and delete any multiple ones with the older StartDate.

E.g.
AccountCode ---- StartDate ---- EndDate
1234------------ 12/31/2012 ---- 3/30/2013
1234------------ 3/31/2013 ------4/29/2013
1234------------ 4/30/2013 -----12/31/2014
9876------------12/31/2012------3/30/2013
9876------------3/31/2013 -------5/30/2014

I want to delete the first two rows for AccountCode '1234' and delete the first row for AccountCode '9876.'

If an AccountCode occurs only once, then I want to leave it alone.

Thanks.


Give this a try in a test environment:


with DelDups as (
select
AccountCode,
StartDate,
EndDate,
rn = row_number() over (partition by AccountCode order by StartDate desc)
from
t_account_code
)
delete from DelDups
where rn > 1;


Edit: Fixed the code.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1462878
Posted Thursday, June 13, 2013 9:00 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, August 21, 2014 9:20 AM
Points: 16, Visits: 47
Hi Lynn,

I get the following error when I parse your script:

Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'over'.


I'm using SQL Server 2008 R2.
Post #1463141
Posted Thursday, June 13, 2013 9:03 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:52 PM
Points: 13,007, Visits: 12,423
SQL_beginner1 (6/13/2013)
Hi Lynn,

I get the following error when I parse your script:

Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'over'.


I'm using SQL Server 2008 R2.


He missed the () after ROW_NUMBER. Make sure you understand this code before you use it.

with DelDups as (
select
AccountCode,
StartDate,
EndDate,
rn = row_number() over (partition by AccountCode order by StartDate desc)
from
t_account_code
)
delete from DelDups
where rn > 1;




_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1463145
Posted Thursday, June 13, 2013 9:48 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 2:53 PM
Points: 20,704, Visits: 32,350
Sean Lange (6/13/2013)
SQL_beginner1 (6/13/2013)
Hi Lynn,

I get the following error when I parse your script:

Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'over'.


I'm using SQL Server 2008 R2.


He missed the () after ROW_NUMBER. Make sure you understand this code before you use it.

with DelDups as (
select
AccountCode,
StartDate,
EndDate,
rn = row_number() over (partition by AccountCode order by StartDate desc)
from
t_account_code
)
delete from DelDups
where rn > 1;




Thanks, Sean. I can't believe that I missed that. Not enough caffeine last night I guess.

I did go back and fix my code.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1463172
Posted Thursday, June 13, 2013 10:01 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, August 21, 2014 9:20 AM
Points: 16, Visits: 47
Thanks, Sean and Lynn! That worked.

Post #1463184
Posted Thursday, June 13, 2013 10:04 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 2:52 PM
Points: 13,007, Visits: 12,423
Lynn Pettis (6/13/2013)
Sean Lange (6/13/2013)
SQL_beginner1 (6/13/2013)
Hi Lynn,

I get the following error when I parse your script:

Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'over'.


I'm using SQL Server 2008 R2.


He missed the () after ROW_NUMBER. Make sure you understand this code before you use it.

with DelDups as (
select
AccountCode,
StartDate,
EndDate,
rn = row_number() over (partition by AccountCode order by StartDate desc)
from
t_account_code
)
delete from DelDups
where rn > 1;




Thanks, Sean. I can't believe that I missed that. Not enough caffeine last night I guess.

I did go back and fix my code.


You would have caught it if you had been able to test it.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1463188
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse