Behavior of tsql batches

  • Hi All,

    Today, I got a case related to transaction behavior on SQL Server.

    One of our Informatica ETL developer came back saying there is some problem with the sql database.

    I went on call to try to understand what she was saying.

    Problem description:

    They have one table and try to load some data using Informatica ETL tool.

    Using Informatica ETL Tool, they have written a simple transformation logic to load data. (truncate & load).

    Table structure

    ==============

    create table test_tbl

    (

    id varchar(100) null,

    last_update_date datetime2(7) null

    )

    create unique clustered index pk_test_dbl on test_tbl(id asc);

    go

    For sake of testing, they are loading 1000 recs and trying to load 2 additional records (intentionally to reproduce a duplicate record).

    Something like below

    Truncate table test_tbl;

    Insert into test_tbl

    select top 1000

    c1,c2 from srcdb.dbo.stg_tbl

    union

    select '9999','2019-10-30 00:00:00'

    union

    select '9999','2019-10-30 00:00:00' --- simulate a duplicate record

    Questions :

    scenario 1 : when they run the ETL Tool pointing to dev server ,dev db, they see expected behaviour i.e. 1001 rows loaded and 1 rec is rejected as it is a duplicate 1. (i.e. 9999 )

    scenario 2: when they change the connection, this time pointing prod server, prod db, they seeing different behaviour. They see only 700+ rows are getting inserted in prod table and not 1001.

    they were asking why? it's the same truncate and load , works in dev perfectly and why it is not working the same for prod db?

    To isolate the issue, I want to do some tests in SSMS, to remove the informatica tool out of picture.

    CASE 1:

    Ran below statements against dev and prod, I see the transaction behaviour as same (. i.e. SQL is considering the whole thing as single transaction, even 1 record fails, the entire txn is rollbacked. )

    truncate table test_tbl;

    go

    Insert into test_tbl

    select top 1000

    c1,c2 from srcdb.dbo.stg_tbl

    union

    select '9999','2019-10-30 00:00:00'

    union

    select '9999','2019-10-30 00:00:00' --- simulate a duplicate record

    Msg 2601, Level 14, State 1, Line 11

    Cannot insert duplicate key row in object 'dbo.test_tbl' with unique index 'pk_test_dbl'. The duplicate key value is (9999).

    select * from test_tbl

    --no rows

    CASE 2 : I have trunacted the table and try to insert only 2 rows . here, this time 1 is getting inserted and 1 row rejected. Infact, i was expecting above error as CASE 1.

    truncate table test_tbl

    go

    Insert into test_tbl

    select '9999','2019-10-30 00:00:00'

    union

    select '9999','2019-10-30 00:00:00' --- simulate a duplicate record

    (1 row(s) affected)

    select * from test_tbl

    id last_update_date

    9999 2019-10-30 00:00:00.0000000

    Can anyone explain this transaction behavior of sql server and ssms ? What is the batch size in sql server to commit or rollback? The reason why I am asking this is, If I try to insert 2 rows , I don't see any error. why ????

    Once I get know the exact sql behavior then I go back to the team and tell this is how sql server treats batch's and if there is something needs to be tweaked in terms of batch size or any db setting in inform them accordingly.

    Thanks,

    Sam

  •  

    Couple of things I can say  -

    UNION removes duplicate - you may want to use UNION ALL if you want to see the error for CASE 2

    TOP 1000 rows in theory will return any 1000 rows  - so top 1000 in Dev and Prod will not be the same - you can use TOP with order by if you want to simulate exact same behaviour in Dev and Prod

    Thanks

  • Hi @Taps,

    I can now understand that UNION operator gets rid of duplicate records and makes the insert go through fine( Case 2).

    Any thoughts on why CASE1 is failing? The union should be able to remove the duplicate row, right and allow top 1000 rows to be inserted ?

  • For Case 1 - I am guessing since the error is primary key violation srcdb.dbo.stg_tbl either has more than one record with same id or it has a record with id 9999

    The error is not occurring in DEV  because TOP 100O  is not retuning same set rows (you have to use an order by to get same set of rows ) and /or srcdb.dbo.stg_tbl is not the same in DEV and PROD

    Thanks

  • @Taps - Thank you. What you said might be the case.

    One more thing , Is there a way to control the number of rows in ssms while loading big batches in SSMS?

     

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply