Hi All,
Today, I got a case related to transaction behavior on SQL Server.
One of our Informatica ETL developer came back saying there is some problem with the sql database.
I went on call to try to understand what she was saying.
Problem description:
They have one table and try to load some data using Informatica ETL tool.
Using Informatica ETL Tool, they have written a simple transformation logic to load data. (truncate & load).
Table structure
==============
create table test_tbl
(
id varchar(100) null,
last_update_date datetime2(7) null
)
create unique clustered index pk_test_dbl on test_tbl(id asc);
go
For sake of testing, they are loading 1000 recs and trying to load 2 additional records (intentionally to reproduce a duplicate record).
Something like below
Truncate table test_tbl;
Insert into test_tbl
select top 1000
c1,c2 from srcdb.dbo.stg_tbl
union
select '9999','2019-10-30 00:00:00'
union
select '9999','2019-10-30 00:00:00' --- simulate a duplicate record
Questions :
scenario 1 : when they run the ETL Tool pointing to dev server ,dev db, they see expected behaviour i.e. 1001 rows loaded and 1 rec is rejected as it is a duplicate 1. (i.e. 9999 )
scenario 2: when they change the connection, this time pointing prod server, prod db, they seeing different behaviour. They see only 700+ rows are getting inserted in prod table and not 1001.
they were asking why? it's the same truncate and load , works in dev perfectly and why it is not working the same for prod db?
To isolate the issue, I want to do some tests in SSMS, to remove the informatica tool out of picture.
CASE 1:
Ran below statements against dev and prod, I see the transaction behaviour as same (. i.e. SQL is considering the whole thing as single transaction, even 1 record fails, the entire txn is rollbacked. )
truncate table test_tbl;
go
Insert into test_tbl
select top 1000
c1,c2 from srcdb.dbo.stg_tbl
union
select '9999','2019-10-30 00:00:00'
union
select '9999','2019-10-30 00:00:00' --- simulate a duplicate record
Msg 2601, Level 14, State 1, Line 11
Cannot insert duplicate key row in object 'dbo.test_tbl' with unique index 'pk_test_dbl'. The duplicate key value is (9999).
select * from test_tbl
--no rows
CASE 2 : I have trunacted the table and try to insert only 2 rows . here, this time 1 is getting inserted and 1 row rejected. Infact, i was expecting above error as CASE 1.
truncate table test_tbl
go
Insert into test_tbl
select '9999','2019-10-30 00:00:00'
union
select '9999','2019-10-30 00:00:00' --- simulate a duplicate record
(1 row(s) affected)
select * from test_tbl
id last_update_date
9999 2019-10-30 00:00:00.0000000
Can anyone explain this transaction behavior of sql server and ssms ? What is the batch size in sql server to commit or rollback? The reason why I am asking this is, If I try to insert 2 rows , I don't see any error. why ????
Once I get know the exact sql behavior then I go back to the team and tell this is how sql server treats batch's and if there is something needs to be tweaked in terms of batch size or any db setting in inform them accordingly.
Thanks,
Sam
October 30, 2019 at 9:21 am
Couple of things I can say -
UNION removes duplicate - you may want to use UNION ALL if you want to see the error for CASE 2
TOP 1000 rows in theory will return any 1000 rows - so top 1000 in Dev and Prod will not be the same - you can use TOP with order by if you want to simulate exact same behaviour in Dev and Prod
Thanks
October 30, 2019 at 11:01 am
Hi @Taps,
I can now understand that UNION operator gets rid of duplicate records and makes the insert go through fine( Case 2).
Any thoughts on why CASE1 is failing? The union should be able to remove the duplicate row, right and allow top 1000 rows to be inserted ?
For Case 1 - I am guessing since the error is primary key violation srcdb.dbo.stg_tbl either has more than one record with same id or it has a record with id 9999
The error is not occurring in DEV because TOP 100O is not retuning same set rows (you have to use an order by to get same set of rows ) and /or srcdb.dbo.stg_tbl is not the same in DEV and PROD
Thanks
October 30, 2019 at 2:44 pm
@Taps - Thank you. What you said might be the case.
One more thing , Is there a way to control the number of rows in ssms while loading big batches in SSMS?
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply