Validating for potential duplicate records

Question

Validating for potential duplicate records

rjjh78

SSCrazy

Points: 2787
More actions
November 1, 2018 at 3:33 pm

#416422

I am writing a script that will import CSV data into a staging table. Before pushing the data to a Production table I am doing a series for validation of the file. In part, U want to make sure we are not processing duplicate records.
Therefore, I am checking for duplicates:
1. From within the new file just imported (Is there a duplicate within the file)
2. Is there already the same record in the staging database?
If a duplicate record already exists in the database and previously processed successfully, I want to UPDATE the newest records to "D"
If the new file contains multiple duplicate records within the file, I want to only process the row with the most current Insert Date (Processed Status to "N") while setting the older records to "D"
Here is my sample data:
-- DROP TABLE #t CREATE TABLE #t (TID int IDENTITY(1,1), EID int, Processed char(1), InsertDt datetime) INSERT INTO #t (EID, Processed, InsertDt) VALUES (10, 'E', '10/15/2018') INSERT INTO #t (EID, Processed, InsertDt) VALUES (10, 'Y', '10/16/2018') INSERT INTO #t (EID, Processed, InsertDt) VALUES (15, 'Y', '10/17/2018 13:03:15') INSERT INTO #t (EID, Processed, InsertDt) VALUES (15, 'N', '11/17/2018 14:01:15') INSERT INTO #t (EID, Processed, InsertDt) VALUES (15, 'N', '11/01/2018 12:01:15') INSERT INTO #t (EID, Processed, InsertDt) VALUES (15, 'N', '11/01/2018 12:02:15') INSERT INTO #t (EID, Processed, InsertDt) VALUES (15, 'N', '11/01/2018 12:03:15') SELECT * FROM #t
My desired results would be:
TID EID Processed InsertDt
1 10 E 2018-10-15 00:00:00.000
2 10 Y 2018-10-16 00:00:00.000
3 15 Y 2018-10-17 13:03:15.000
4 15 D 2018-11-17 14:01:15.000
5 15 D 2018-11-01 12:01:15.000
6 15 D 2018-11-01 12:02:15.000
7 15 Y 2018-11-01 12:03:15.000
MANY THANKS in advance!

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Joe Torre SSChampion Points: 10248 More actions · Answer 1

I think you are over complicating the issue.
I would write a procedure to truncate my staging table, insert into the staging table using select distinct from an OpenRowset from the CSV using the BULK driver with a format file. then insert the target table with a left join where target is null.

rjjh78 SSCrazy Points: 2787 More actions · Answer 2

rjjh78

SSCrazy

Points: 2787

November 2, 2018 at 9:17 am

#2012207

Got it! Thank you Joe! 🙂