Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


What happens if new records are inserted in a source table during a package execution?


What happens if new records are inserted in a source table during a package execution?

Author
Message
Paul Hernández
Paul Hernández
SSC-Enthusiastic
SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)

Group: General Forum Members
Points: 173 Visits: 654
Hi Folks,

I have a package that loads records in a CRM destination using a Web Service. This execution takes around let´s say 3 hours. I would like to know what would happen if new records are inserted in the source table during the package execution?
Supposed that there`s no filter in the query that gets the source records.
I can program the loading process in another way, I know. But what I really want to know is what would happen in this scenario? Are the new records loaded? Maybe depends on the buffer, if all the records were already catch when the new records come in?
EDIT: "suppose that the OLEDB source that I'm using is still in Yellow when the new records arrive"
Any comment would be appreciated.
Regards,

Paul Hernández
Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)

Group: General Forum Members
Points: 8235 Visits: 14368
Paul Hernández (5/3/2013)
Hi Folks,

I have a package that loads records in a CRM destination using a Web Service. This execution takes around let´s say 3 hours. I would like to know what would happen if new records are inserted in the source table during the package execution?
Supposed that there`s no filter in the query that gets the source records.
I can program the loading process in another way, I know. But what I really want to know is what would happen in this scenario? Are the new records loaded? Maybe depends on the buffer, if all the records were already catch when the new records come in?
EDIT: "suppose that the OLEDB source that I'm using is still in Yellow when the new records arrive"
Any comment would be appreciated.
Regards,

The window is small after you issue the SELECT but it's possible for more rows to sneak into the resultset even when in READ COMMITTED mode.

When the source is yellow what is the wait type for the SPID selecting the rows? Async network or something else? Or is SSIS wrong and is the SPID cleared out?

This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.

Are you sweeping the rows, i.e. deleting them from the source after you push them through to the web service?

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Phil Parkin
Phil Parkin
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8320 Visits: 19469
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
wolfkillj
wolfkillj
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1104 Visits: 2582
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Paul Hernández
Paul Hernández
SSC-Enthusiastic
SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)SSC-Enthusiastic (173 reputation)

Group: General Forum Members
Points: 173 Visits: 654
Thanks so much for your answers,

Thanks to the opc answer I could learn about SPIDs, Wait Types, Packet Size, etc.

I have already implemented the IDENTITY column (Idx) in the staging table. After a record is successfully imported in the CRM I write the CRM uniqueidentifier back in the table together with the datetime of the update in a column called UpdateDT.

Then, I create a second package to execute this package. I pass the number of records to load as a parameter, and I query the MAX(Idx), then I have something like this in the source query:
Where Idx > MAX(Idx) and Idx<MAX(Idx)+RowsToLoad



Of course, I’m using some variables and other names.

I created a loop that keep the loading process active until there are no more records to load. At the moment is still running since monday (4 days) and 1.3 million of customers out of 2.2 are already imported.

Kind Regards,

Paul Hernández
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45008 Visits: 39881
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
wolfkillj
wolfkillj
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1104 Visits: 2582
Jeff Moden (5/10/2013)
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


Jeff -

Thanks for mentioning this. It prompted me to think about how the existence of INSERT/UPDATE transactions that have been initiated but not committed could cause some rows to fall through the cracks of ETL processes that select rows for extraction based on the comparison of some ever-increasing column with a "cutoff value". Thanks for the suggesting your method - I'll be looking into it.

Jason

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)

Group: General Forum Members
Points: 8235 Visits: 14368
wolfkillj (5/13/2013)
Jeff Moden (5/10/2013)
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


Jeff -

Thanks for mentioning this. It prompted me to think about how the existence of INSERT/UPDATE transactions that have been initiated but not committed could cause some rows to fall through the cracks of ETL processes that select rows for extraction based on the comparison of some ever-increasing column with a "cutoff value". Thanks for the suggesting your method - I'll be looking into it.

Jason

UPDATEs could be missed but that's an entirely different ballgame and would require a different approach than "less than or equal to IDENTITY". For INSERTs, I do not think they will fall through the cracks due to ACID properties.

I like Jeff's technique for the scenario he mentioned, high speed staging tables fed by triggers on multiple other tables, but it could be deemed over-engineered in some environments. I have added it to my toolkit though, thanks Jeff!

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
wolfkillj
wolfkillj
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1104 Visits: 2582
opc.three (5/13/2013)
wolfkillj (5/13/2013)
Jeff Moden (5/10/2013)
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


Jeff -

Thanks for mentioning this. It prompted me to think about how the existence of INSERT/UPDATE transactions that have been initiated but not committed could cause some rows to fall through the cracks of ETL processes that select rows for extraction based on the comparison of some ever-increasing column with a "cutoff value". Thanks for the suggesting your method - I'll be looking into it.

Jason

UPDATEs could be missed but that's an entirely different ballgame and would require a different approach than "less than or equal to IDENTITY". For INSERTs, I do not think they will fall through the cracks due to ACID properties.

I like Jeff's technique for the scenario he mentioned, high speed staging tables fed by triggers on multiple other tables, but it could be deemed over-engineered in some environments. I have added it to my toolkit though, thanks Jeff!


I haven't examined it fully, but in some brief testing I did, I observed some interesting behavior that leads me to believe there could be issues with the "less than or equal to IDENTITY" approach with inserts.

I created a table named testTable with an identity column named rowNbr (and an irrelevant col1 varchar(10)).

I initiated a transaction to insert a thousand rows with a WxAITFOR DELAY '00:00:10' after the INSERT but before the COMMIT.

While this transaction was open, I opened another connection and executed in immediate succession an INSERT of a single row, a SELECT @maxRowNbr = MAX(rowNbr) FROM testTable, and a SELECT * FROM testTable WHERE rowNbr <= @maxRowNbr. The INSERT and the SELECT MAX(rowNbr) completed before the other transaction committed, and the SELECT MAX(rowNbr) returned 1001. The SELECT * FROM testTable seemed to be blocked by the other transaction, but returned 1001 rows once the other transaction committed.

I haven't examined the locking/blocking that occurred or researched this further, but I'm curious about whether SQL Server will ALWAYS require a large INSERT transaction that involves an IDENTITY column to complete before executing a SELECT * WHERE [IDENTITY column] <= [some number higher than the highest IDENTITY value in the INSERT batch].

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Orlando Colamatteo
Orlando Colamatteo
SSCrazy Eights
SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)SSCrazy Eights (8.2K reputation)

Group: General Forum Members
Points: 8235 Visits: 14368
The issuance of the IDENTITY column is part of the transaction so we are protected from loss of data.

In SSMS, Query Window 1, run this:

USE YourDatabaseName;
DROP TABLE dbo.StagingTable
CREATE TABLE dbo.StagingTable (StagingTableID INT NOT NULL IDENTITY(1,1), Col1 sysname);



In Query Window 2, populate the table with some data:

BEGIN TRAN
INSERT INTO dbo.StagingTable
(
Col1
)
SELECT TOP 1000
c1.name
FROM sys.columns c1
CROSS JOIN sys.columns c2;

-- note, I am leaving the transaction open on purpose



In Query Window 3, insert a single row:

INSERT  INTO dbo.StagingTable
(
Col1
)
SELECT TOP 1
c1.name
FROM sys.columns c1
CROSS JOIN sys.columns c2;

-- note that the insert succeeds despite the fact
-- that there are 1000 rows trying to, but not yet
-- committed on another SPID



Query Window 4, along comes our SSIS Package wanting to get the MAX ID and copy the data:

SELECT  MAX(StagingTableID)
FROM dbo.StagingTable;



Nope, must wait...due to the open transaction in Window 2. Go back to Window 2 and call COMMIT or ROLLBACK, your choice, and you'll see as soon as that happens Window 4 completes immediately and returns 1001. This is the protection we need, i.e. no lost data.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search