Capture rejected rows during bulk inserts.

Question

Post reply

Capture rejected rows during bulk inserts.

Ram

Old Hand

Points: 353
More actions
October 13, 2021 at 5:41 pm

#3940280

We have a weekly process that inserts few million rows into several tables. The insertion happens with BULK INSERT command from flat files. During the inserts the tables are not used. We want to be able to filter out the duplicate rows, if any, going into some of the tables and then capture those duplicates while preventing them from going into the main tables. The idea is to insert the duplicate rows into an error table but let the rows which are not duplicates get inserted into the main table. The main tables have the Primary Keys declared on them which will cause the rejection of duplicate values.
I was researching online and it seems using triggers is not a good idea but the recommendation is to use Stored procedures. Coming from Oracle world, oracle has badfile and discard file options with the SQL loader for the same. Is there a similar option in SQL server? If not, what is the best approach.
Thanks,
Ram.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 101192 More actions · Answer 1

A trigger is the perfect approach for this. Some developers just irrationally avoid triggers no matter what.

Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 2

ScottPletcher wrote:

A trigger is the perfect approach for this. Some developers just irrationally avoid triggers no matter what.
Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

[Edit] My apologies... The following is based on a 16 year old memory, which may be incorrect. I have to double check on this with some test code.

The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table. How is a trigger going to help there?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 3

My recommendation would be to read the documentation for BULK INSERT, which is located at the following URL:

https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15

The two keywords to search for in that article are ERRORFILE and MAXERRORS.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 4

Just to add to that, I NEVER bulk insert directly into the final table. I ALWAYS load into a "staging" table and then do validations and dupe checks between that table and the final table. It more easily solves your duplicate row problem and a whole bunch more... especially if you also learn how to use HASHBYTES to check for changes, etc.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101192 More actions · Answer 5

Jeff Moden wrote:

ScottPletcher wrote:
A trigger is the perfect approach for this. Some developers just irrationally avoid triggers no matter what.
Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.
The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table. How is a trigger going to help there?

Because for this process, you'd use an INSTEAD OF INSERT trigger. No surprise there, right? How else could one do it with a trigger??

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 6

ScottPletcher wrote:

Jeff Moden wrote:
ScottPletcher wrote:
A trigger is the perfect approach for this. Some developers just irrationally avoid triggers no matter what.
Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.
The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table. How is a trigger going to help there?
Because for this process, you'd use an INSTEAD OF INSERT trigger. No surprise there, right? How else could one do it with a trigger??

So the code in the trigger would say, "hmmm...I need to compare these two rows. Ah... lookie there! I have a dupe. I'm going to move it to this other place instead!"

Right?

So no different that typical "upsert" code EXCEPT that you wouldn't have to run a stored procedure. It would just happen. Is that correct.

The reason I'm asking is because I thought you had come up with a way for the trigger to do error detection processing like the error functionality built into BULK INSERT.

Or am I missing something? And, no... just to be sure because the written word sometimes sucks when one is trying to be brief, I'm not being snarky here. This actually sound like a pretty cool idea and I'm thinking of "possibilities".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ScottPletcher SSC Guru Points: 101192 More actions · Answer 7

Jeff Moden wrote:

ScottPletcher wrote:
Jeff Moden wrote:
ScottPletcher wrote:
A trigger is the perfect approach for this. Some developers just irrationally avoid triggers no matter what.
Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.
The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table. How is a trigger going to help there?
Because for this process, you'd use an INSTEAD OF INSERT trigger. No surprise there, right? How else could one do it with a trigger??
So the code in the trigger would say, "hmmm...I need to compare these two rows. Ah... lookie there! I have a dupe. I'm going to move it to this other place instead!"
Right?
So no different that typical "upsert" code EXCEPT that you wouldn't have to run a stored procedure. It would just happen. Is that correct.
The reason I'm asking is because I thought you had come up with a way for the trigger to do error detection processing like the error functionality built into BULK INSERT.
Or am I missing something? And, no... just to be sure because the written word sometimes sucks when one is trying to be brief, I'm not being snarky here. This actually sound like a pretty cool idea and I'm thinking of "possibilities".

Yeah, the code in the trigger would checks for dups and add them to a different table, either before or after it INSERTs all the new (non-existent) rows into the main table.

Maybe this will help everyone visualize what would be done here (if there's a column with an $IDENTITY property, the code would need adjusted for that):

INSERT INTO dbo.main_table

SELECT *

FROM inserted i

WHERE NOT EXISTS (

SELECT 1

FROM dbo.main_table mt

WHERE mt.key_col = i.key_col /*AND mt.key_col2 = i.key_col2 ...*/

)

INSERT INTO dbo.dups_table

SELECT *

FROM inserted i

WHERE EXISTS (

SELECT 1

FROM dbo.main_table mt

WHERE mt.key_col = i.key_col/*AND mt.key_col2 = i.key_col2 ...*/

)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

ScottPletcher SSC Guru Points: 101192 More actions · Answer 8

Jeff Moden wrote:

My recommendation would be to read the documentation for BULK INSERT, which is located at the following URL:
https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15
The two keywords to search for in that article are ERRORFILE and MAXERRORS.

First, only if you really are on SQL 2017+. Most people aren't yet. Although this specific forum is for SQL 2019, so we should be good there.

Second, I don't believe that option would handle duplicate keys. That is, that's not one of the errors that ERRORFILE is designed to handle. At least as I understand how that option works, it won't avoid the normal error that would error when trying to INSERT a dup key. But I don't have time to test that right now.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jeff Moden SSC Guru Points: 1004614 More actions · Answer 9

ScottPletcher wrote:

Jeff Moden wrote:
My recommendation would be to read the documentation for BULK INSERT, which is located at the following URL:
https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15
The two keywords to search for in that article are ERRORFILE and MAXERRORS.
First, only if you really are on SQL 2017+. Most people aren't yet. Although this specific forum is for SQL 2019, so we should be good there.
Second, I don't believe that option would handle duplicate keys. That is, that's not one of the errors that ERRORFILE is designed to handle. At least as I understand how that option works, it won't avoid the normal error that would error when trying to INSERT a dup key. But I don't have time to test that right now.

I hate what they're doing to the documentation, Scott. It says "2017" because that's the earliest version that Microsoft openly supports since they recently closed out SQL Server 2016 with SP3. ErrorFile and MaxErrors have been around since BULK INSERT first appeared.

As for the other thing, that's based on a nearly 16 year old memory. I might be remembering it incorrectly and so I'll need to setup a test. I'm correcting my previous post since there's a doubt.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Capture rejected rows during bulk inserts.

Cookies on SQLServerCentral