Capture rejected rows during bulk inserts.

  • We have a weekly process that inserts few million rows into several tables. The insertion happens with BULK INSERT command from flat files. During the inserts the tables are not used. We want to be able to filter out the duplicate rows, if any, going into some of the tables and then capture those duplicates while preventing them from going into the main tables. The idea is to insert the duplicate rows into an error table but let the rows which are not duplicates get inserted into the main table. The main tables have the Primary Keys declared on them which will cause the rejection of duplicate values.

    I was researching online and it seems using triggers is not a good idea but the recommendation is to use Stored procedures. Coming from Oracle world, oracle has badfile and discard file options with the SQL loader for the same. Is there a similar option in SQL server? If not, what is the best approach.

    Thanks,

    Ram.

     

  • A trigger is the perfect approach for this.  Some developers just irrationally avoid triggers no matter what.

    Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    A trigger is the perfect approach for this.  Some developers just irrationally avoid triggers no matter what.

    Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

    [Edit] My apologies... The following is based on a 16 year old memory, which may be incorrect.  I have to double check on this with some test code.

    The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table.  How is a trigger going to help there?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My recommendation  would be to read the documentation for BULK INSERT, which is located at the following URL:

    https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15

    The two keywords to search for in that article are ERRORFILE and MAXERRORS.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just to add to that, I NEVER bulk insert directly into the final table.  I ALWAYS load into a "staging" table and then do validations and dupe checks between that table and the final table.  It more easily solves your duplicate row problem and a whole bunch more... especially if you also learn how to use HASHBYTES to check for changes, etc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    ScottPletcher wrote:

    A trigger is the perfect approach for this.  Some developers just irrationally avoid triggers no matter what.

    Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

    The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table.  How is a trigger going to help there?

    Because for this process, you'd use an INSTEAD OF INSERT trigger.  No surprise there, right?  How else could one do it with a trigger??

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    A trigger is the perfect approach for this.  Some developers just irrationally avoid triggers no matter what.

    Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

    The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table.  How is a trigger going to help there?

    Because for this process, you'd use an INSTEAD OF INSERT trigger.  No surprise there, right?  How else could one do it with a trigger??

    So the code in the trigger would say, "hmmm...I need to compare these two rows.  Ah... lookie there!  I have a dupe.  I'm going to move it to this other place instead!"

    Right?

    So no different that typical "upsert" code EXCEPT that you wouldn't have to run a stored procedure.  It would just happen.  Is that correct.

    The reason I'm asking is because I thought you had come up with a way for the trigger to do error detection processing like the error functionality built into BULK INSERT.

    Or am I missing something?  And, no... just to be sure because the written word sometimes sucks when one is trying to be brief, I'm not being snarky here.  This actually sound like a pretty cool idea and I'm thinking of "possibilities".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    ScottPletcher wrote:

    Jeff Moden wrote:

    ScottPletcher wrote:

    A trigger is the perfect approach for this.  Some developers just irrationally avoid triggers no matter what.

    Be sure to specify option FIRE_TRIGGERS in your BULK INSERT command.

    The duplicate rows being inserted will cause an error to occur because of the PK the OP has on the receiving table.  How is a trigger going to help there?

    Because for this process, you'd use an INSTEAD OF INSERT trigger.  No surprise there, right?  How else could one do it with a trigger??

    So the code in the trigger would say, "hmmm...I need to compare these two rows.  Ah... lookie there!  I have a dupe.  I'm going to move it to this other place instead!"

    Right?

    So no different that typical "upsert" code EXCEPT that you wouldn't have to run a stored procedure.  It would just happen.  Is that correct.

    The reason I'm asking is because I thought you had come up with a way for the trigger to do error detection processing like the error functionality built into BULK INSERT.

    Or am I missing something?  And, no... just to be sure because the written word sometimes sucks when one is trying to be brief, I'm not being snarky here.  This actually sound like a pretty cool idea and I'm thinking of "possibilities".

    Yeah, the code in the trigger would checks for dups and add them to a different table, either before or after it INSERTs all the new (non-existent) rows into the main table.

    Maybe this will help everyone visualize what would be done here (if there's a column with an $IDENTITY property, the code would need adjusted for that):

    INSERT INTO dbo.main_table

    SELECT *

    FROM inserted i

    WHERE NOT EXISTS (

    SELECT 1

    FROM dbo.main_table mt

    WHERE mt.key_col = i.key_col /*AND mt.key_col2 = i.key_col2 ...*/

    )

    INSERT INTO dbo.dups_table

    SELECT *

    FROM inserted i

    WHERE EXISTS (

    SELECT 1

    FROM dbo.main_table mt

    WHERE mt.key_col = i.key_col/*AND mt.key_col2 = i.key_col2 ...*/

    )

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Jeff Moden wrote:

    My recommendation  would be to read the documentation for BULK INSERT, which is located at the following URL:

    https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15

    The two keywords to search for in that article are ERRORFILE and MAXERRORS.

    First, only if you really are on SQL 2017+.  Most people aren't yet.  Although this specific forum is for SQL 2019, so we should be good there.

    Second, I don't believe that option would handle duplicate keys.  That is, that's not one of the errors that ERRORFILE is designed to handle.  At least as I understand how that option works, it won't avoid the normal error that would error when trying to INSERT a dup key.  But I don't have time to test that right now.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    My recommendation  would be to read the documentation for BULK INSERT, which is located at the following URL:

    https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15

    The two keywords to search for in that article are ERRORFILE and MAXERRORS.

    First, only if you really are on SQL 2017+.  Most people aren't yet.  Although this specific forum is for SQL 2019, so we should be good there.

    Second, I don't believe that option would handle duplicate keys.  That is, that's not one of the errors that ERRORFILE is designed to handle.  At least as I understand how that option works, it won't avoid the normal error that would error when trying to INSERT a dup key.  But I don't have time to test that right now.

    I hate what they're doing to the documentation, Scott.  It says "2017" because that's the earliest version that Microsoft openly supports since they recently closed out SQL Server 2016 with SP3.   ErrorFile and MaxErrors have been around since BULK INSERT first appeared.

    As for the other thing, that's based on a nearly 16 year old memory.  I might be remembering it incorrectly and so I'll need to setup a test.  I'm correcting my previous post since there's a doubt.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply