SSIS Balanced Data Distributor Data Flow Component (Table Locking)

Question

SSIS Balanced Data Distributor Data Flow Component (Table Locking)

Letron Brantley

Ten Centuries

Points: 1202
More actions
January 12, 2016 at 12:58 pm

#303633

I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246695 More actions · Answer 1

Letron Brantley (1/12/2016)
I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron

I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Letron Brantley (1/12/2016)
I have a question regarding the Microsoft Balanced Data Distributor.
http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx
Does anyone have any insight into how SQL Server handles target table locking if you use the BDD to split a data stream up into multiple segmented streams and load into the same table. The premise is that you should be able to use the BDD to make use of parallelism and potentially load data faster. I'm a little skeptical though.
For instance:
I want to pull 10 million rows from a source table so I create a package with a simple data flow. In the data flow I have
1. One source component to "select * from SourceTableA".
2. One BDD component to split the data into 10 distinct streams. So each stream should pipe 1million rows each.
3. Each stream will have a OLEDB destination component pointing to the same TargetTableB
The BDD is supposed to allow you to pipe multiple segments of data in parallel to speed the load. How then is locking handled on the target side. My conventional wisdom tells me that the first stream will acquire a table lock and the other 9 streams should be blocked until the first finished and so on. To me the benefit is defeated.
Am I missing some method in how the BDD is supposed to work?
Thanks,
Letron

Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

Letron Brantley Ten Centuries Points: 1202 More actions · Answer 3

Lynn Pettis (1/12/2016)
Make it easier for others to read the blog: http://blogs.msdn.com/b/sqlperf/archive/2011/05/25/the-balanced-data-distributor-for-ssis.aspx

Ahh Thanks Lynn!! I just hyperlinked it.

Letron Brantley Ten Centuries Points: 1202 More actions · Answer 4

Phil Parkin (1/12/2016)
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.

Ok thanks Phil,

I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.

Thanks,

Letron

Phil Parkin SSC Guru Points: 246695 More actions · Answer 5

Letron Brantley (1/12/2016)
Phil Parkin (1/12/2016)
I just read through the link you suggested. It suggests that the parallel loading of a table will work if that table is a heap. Otherwise it suggests a UNION ALL to unparallelify (sorry about that!) the data streams, prior to performing the INSERT.
Ok thanks Phil,
I'm still testing but so far I'm not seeing any performance improvement with heap tables. It's like the data is loaded one stream at a time and not in parallel. I'm doing more iterations to verify.
Thanks,
Letron

That's interesting, please post back with your findings. If that's the case, it directly contradicts what the notes in the link say.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

prvmine SSCertifiable Points: 7006 More actions · Answer 6

Suggest changing the destination connection advanced property for MultipleActiveResultSets to True