Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

What happens if new records are inserted in a source table during a package execution? Expand / Collapse
Author
Message
Posted Friday, May 3, 2013 5:12 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 18, 2014 5:26 AM
Points: 119, Visits: 483
Hi Folks,

I have a package that loads records in a CRM destination using a Web Service. This execution takes around let´s say 3 hours. I would like to know what would happen if new records are inserted in the source table during the package execution?
Supposed that there`s no filter in the query that gets the source records.
I can program the loading process in another way, I know. But what I really want to know is what would happen in this scenario? Are the new records loaded? Maybe depends on the buffer, if all the records were already catch when the new records come in?
EDIT: "suppose that the OLEDB source that I'm using is still in Yellow when the new records arrive"
Any comment would be appreciated.
Regards,


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1449133
Posted Monday, May 6, 2013 12:34 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:34 AM
Points: 7,098, Visits: 12,606
Paul Hernández (5/3/2013)
Hi Folks,

I have a package that loads records in a CRM destination using a Web Service. This execution takes around let´s say 3 hours. I would like to know what would happen if new records are inserted in the source table during the package execution?
Supposed that there`s no filter in the query that gets the source records.
I can program the loading process in another way, I know. But what I really want to know is what would happen in this scenario? Are the new records loaded? Maybe depends on the buffer, if all the records were already catch when the new records come in?
EDIT: "suppose that the OLEDB source that I'm using is still in Yellow when the new records arrive"
Any comment would be appreciated.
Regards,

The window is small after you issue the SELECT but it's possible for more rows to sneak into the resultset even when in READ COMMITTED mode.

When the source is yellow what is the wait type for the SPID selecting the rows? Async network or something else? Or is SSIS wrong and is the SPID cleared out?

This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.

Are you sweeping the rows, i.e. deleting them from the source after you push them through to the web service?


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1449839
Posted Tuesday, May 7, 2013 5:58 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 11:23 AM
Points: 5,047, Visits: 11,799
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1450087
Posted Tuesday, May 7, 2013 9:19 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, August 22, 2014 8:05 AM
Points: 1,222, Visits: 2,545
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1450213
Posted Friday, May 10, 2013 6:16 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, August 18, 2014 5:26 AM
Points: 119, Visits: 483
Thanks so much for your answers,

Thanks to the opc answer I could learn about SPIDs, Wait Types, Packet Size, etc.

I have already implemented the IDENTITY column (Idx) in the staging table. After a record is successfully imported in the CRM I write the CRM uniqueidentifier back in the table together with the datetime of the update in a column called UpdateDT.

Then, I create a second package to execute this package. I pass the number of records to load as a parameter, and I query the MAX(Idx), then I have something like this in the source query:
Where Idx &gt; MAX(Idx) and Idx&lt;MAX(Idx)+RowsToLoad

Of course, I’m using some variables and other names.

I created a loop that keep the loading process active until there are no more records to load. At the moment is still running since monday (4 days) and 1.3 million of customers out of 2.2 are already imported.

Kind Regards,



Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1451548
Posted Friday, May 10, 2013 4:53 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:58 AM
Points: 36,995, Visits: 31,517
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1451779
Posted Monday, May 13, 2013 8:31 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, August 22, 2014 8:05 AM
Points: 1,222, Visits: 2,545
Jeff Moden (5/10/2013)
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


Jeff -

Thanks for mentioning this. It prompted me to think about how the existence of INSERT/UPDATE transactions that have been initiated but not committed could cause some rows to fall through the cracks of ETL processes that select rows for extraction based on the comparison of some ever-increasing column with a "cutoff value". Thanks for the suggesting your method - I'll be looking into it.

Jason


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1452135
Posted Monday, May 13, 2013 11:33 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:34 AM
Points: 7,098, Visits: 12,606
wolfkillj (5/13/2013)
Jeff Moden (5/10/2013)
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


Jeff -

Thanks for mentioning this. It prompted me to think about how the existence of INSERT/UPDATE transactions that have been initiated but not committed could cause some rows to fall through the cracks of ETL processes that select rows for extraction based on the comparison of some ever-increasing column with a "cutoff value". Thanks for the suggesting your method - I'll be looking into it.

Jason

UPDATEs could be missed but that's an entirely different ballgame and would require a different approach than "less than or equal to IDENTITY". For INSERTs, I do not think they will fall through the cracks due to ACID properties.

I like Jeff's technique for the scenario he mentioned, high speed staging tables fed by triggers on multiple other tables, but it could be deemed over-engineered in some environments. I have added it to my toolkit though, thanks Jeff!


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1452246
Posted Monday, May 13, 2013 1:10 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, August 22, 2014 8:05 AM
Points: 1,222, Visits: 2,545
opc.three (5/13/2013)
wolfkillj (5/13/2013)
Jeff Moden (5/10/2013)
wolfkillj (5/7/2013)
Phil Parkin (5/7/2013)
This is probably what you meant by "I can program the loading process in another way, I know." but to remove most of the doubt I like to code extracts like this off of an ever-increasing key, i.e. an IDENTITY column setup as a unique index or primary key. This way I will select the maximum ID into a variable, then select all rows that are less than or equal to that ID.


+1 to this, me too.


Ditto on the +1. If you want to do only an incremental extraction (instead of extracting ALL rows where ID < MAX(ID), create a log table with a column called extractedThroughID or something similar. When you begin the extraction, read the most recent extractedThroughID from the log table into a variable (lastExtractEndID, for example) and the MAX(ID) into another variable (e.g., currentExtractEndID) and extract only rows WHERE ID > lastExtractStartID AND ID <= currentExtractEndID. When the extraction process completes successfully, insert the value of currentExtractEndID in the log table as extractedThroughID so the extraction will start from there next time.


I guess I'm he odd ball. I've had some situations (high speed staging tables fed by triggers on multiple other tables) where lower numbered rows were still in the process of being committed that what the MAX(ID) was and it caused some real headaches. I don't rely on MAX(ID). Instead, I read all of the rows that have no locks on them by using READ PAST and I store the IDs in a temp table to control what I work on and delete, etc.


Jeff -

Thanks for mentioning this. It prompted me to think about how the existence of INSERT/UPDATE transactions that have been initiated but not committed could cause some rows to fall through the cracks of ETL processes that select rows for extraction based on the comparison of some ever-increasing column with a "cutoff value". Thanks for the suggesting your method - I'll be looking into it.

Jason

UPDATEs could be missed but that's an entirely different ballgame and would require a different approach than "less than or equal to IDENTITY". For INSERTs, I do not think they will fall through the cracks due to ACID properties.

I like Jeff's technique for the scenario he mentioned, high speed staging tables fed by triggers on multiple other tables, but it could be deemed over-engineered in some environments. I have added it to my toolkit though, thanks Jeff!


I haven't examined it fully, but in some brief testing I did, I observed some interesting behavior that leads me to believe there could be issues with the "less than or equal to IDENTITY" approach with inserts.

I created a table named testTable with an identity column named rowNbr (and an irrelevant col1 varchar(10)).

I initiated a transaction to insert a thousand rows with a WxAITFOR DELAY '00:00:10' after the INSERT but before the COMMIT.

While this transaction was open, I opened another connection and executed in immediate succession an INSERT of a single row, a SELECT @maxRowNbr = MAX(rowNbr) FROM testTable, and a SELECT * FROM testTable WHERE rowNbr <= @maxRowNbr. The INSERT and the SELECT MAX(rowNbr) completed before the other transaction committed, and the SELECT MAX(rowNbr) returned 1001. The SELECT * FROM testTable seemed to be blocked by the other transaction, but returned 1001 rows once the other transaction committed.

I haven't examined the locking/blocking that occurred or researched this further, but I'm curious about whether SQL Server will ALWAYS require a large INSERT transaction that involves an IDENTITY column to complete before executing a SELECT * WHERE [IDENTITY column] <= [some number higher than the highest IDENTITY value in the INSERT batch].


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1452284
Posted Monday, May 13, 2013 1:46 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 7:34 AM
Points: 7,098, Visits: 12,606
The issuance of the IDENTITY column is part of the transaction so we are protected from loss of data.

In SSMS, Query Window 1, run this:

USE YourDatabaseName;
DROP TABLE dbo.StagingTable
CREATE TABLE dbo.StagingTable (StagingTableID INT NOT NULL IDENTITY(1,1), Col1 sysname);

In Query Window 2, populate the table with some data:

BEGIN TRAN
INSERT INTO dbo.StagingTable
(
Col1
)
SELECT TOP 1000
c1.name
FROM sys.columns c1
CROSS JOIN sys.columns c2;

-- note, I am leaving the transaction open on purpose

In Query Window 3, insert a single row:

INSERT  INTO dbo.StagingTable
(
Col1
)
SELECT TOP 1
c1.name
FROM sys.columns c1
CROSS JOIN sys.columns c2;

-- note that the insert succeeds despite the fact
-- that there are 1000 rows trying to, but not yet
-- committed on another SPID

Query Window 4, along comes our SSIS Package wanting to get the MAX ID and copy the data:

SELECT  MAX(StagingTableID)
FROM dbo.StagingTable;

Nope, must wait...due to the open transaction in Window 2. Go back to Window 2 and call COMMIT or ROLLBACK, your choice, and you'll see as soon as that happens Window 4 completes immediately and returns 1001. This is the protection we need, i.e. no lost data.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1452299
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse