Parallel Processing

  • Hi,

    I have designed and implemented a classic datawarehouse system to load data for our different stores.. It processes the data from the staging tables into the core dwh via t-sql procedures I mostly used merge syntax as well as insert as select to load the data from staging into core dwh.

    The jobs works fine if i load the data for each store seperatly. But if I want to execute the load processes in parallel for each store i get a multiplicity of Locks mostly LCM_M_IX and LCM_M_S on the target tables although I have partitioned each of the tables by the StoreId and also filtering to the StoreId in the merge and insert scripts. Addtionially I've put in each procedure the command "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED".

    So what can I do more? Is there perhaps any guideline what I have to do to implement a parallel processing with SQL Server 2008R2?

    I look forwardto your answers!

    Many thanks

  • You can load a data warehouse in parallel, but I would try to avoid loading a table by multiple processes at once.

    Try loading different tables at the same time.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi Koen,

    thanks for the fast answer. But is there no other chance to load the table at the same time? Because i thought that because of the partioned tables that no locks appear on the entire table but only on the relevant partition. The lock escalation of the table is also set to "AUTO".

    So you say that it isn't possible with sql server to load data into differnt partitions of one table at the same time without locks? So can you explain me what is the reason therfore?

  • mr_effe (4/17/2014)


    Hi Koen,

    thanks for the fast answer. But is there no other chance to load the table at the same time? Because i thought that because of the partioned tables that no locks appear on the entire table but only on the relevant partition. The lock escalation of the table is also set to "AUTO".

    So you say that it isn't possible with sql server to load data into differnt partitions of one table at the same time without locks? So can you explain me what is the reason therfore?

    I'm not saying that it isn't possible (it is btw: Enabling Partition Level Locking in SQL Server 2008). Just saying it is much easier to avoid locks and load different tables.

    Maybe it is MERGE that is giving troubles?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Yes you're right. The most locks occur by the merge statements. Do you have an idea how to prevent it?

    So can you also give me some hints what i have to consider when i want to update / insert in one table (with different partitions) at the same time?

    Many thanks

  • mr_effe (4/17/2014)


    Yes you're right. The most locks occur by the merge statements. Do you have an idea how to prevent it?

    So can you also give me some hints what i have to consider when i want to update / insert in one table (with different partitions) at the same time?

    Many thanks

    I'm a bit out of my league here, so I asked some other experts if they can take a look at this.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Can you give a few more details of how the tables are partitioned?

    And have you tried single threading this process?

    We had a mix of parallel and single threading loading in ours.

    Performance was still very good, although a lot of that can be very dependent on the overall design.

  • Hi,

    most of the tables are partionend by our StoreId. In sum we have 6 stores (1,2,...6), so 6 partitions per table. And all our stored procedures have a input-parameter StoreId which then only process the data of the specific store. We have insert as select expressions as well as merge statements.

    And each of the 6 stores are processed at the same time. And then we got a lot of locks and the processing of one store have to wait until the other store have finished the process. If i started the processing of the stores in sequence no locks occur.

    And now I want to know what I've to do to minimize the locks and thereby increase the performance.

  • See if this helps.

    Skip down to the loading partitioned table.

  • I think this solution doesn't help me. Because a switch partition leads to a Schema-M lock. And if all of my parallel processes wants a schema-m lock nothing works anymore.

    I already had this situation with my "truncate partition" function which uses switch partition to delete the table partition. It works fine if only one process is active. If i started the jobs in parallel all of them waits for each other. So I have already switched the logic from "truncate partition" to a delete logic.

    So I think this solution will not help me 🙁

  • Are you processing all dimensions first, then fact table?

    And what is the performance processing 1 store at a time?

    Generally, we would partition by year.

    Clustered index on Business Unit, which would be similar to store.

  • mr_effe (4/17/2014)


    I think this solution doesn't help me. Because a switch partition leads to a Schema-M lock. And if all of my parallel processes wants a schema-m lock nothing works anymore.

    I already had this situation with my "truncate partition" function which uses switch partition to delete the table partition. It works fine if only one process is active. If i started the jobs in parallel all of them waits for each other. So I have already switched the logic from "truncate partition" to a delete logic.

    So I think this solution will not help me 🙁

    If you have N partitions, you can use N+2 jobs:- one which switches the partitions out into the new table and then kicks off the insert job for each partition, one per partition for the inserts, and one which polls to find finished jobs from the set handling the inserts and when it finds one switches the corresponding partition back and notes that it's dealt with that job so it won't poll for it again, and when there is nothing left to poll for terminates.

    That avoids the clashes on the schema mod lock.

    Obviously you need to deal with one of the insert jobs failing, but it's probably easier to deal with than it is in the case where the insert jobs are clashing on locks.

    Tom

  • Hi,

    thanks a lot for your tips. I've decided to change the sequence of the jobs of each parallel process, so I could also minimize the locks.

    But I have still a performance isssue due to wrong execution plans.

    I load the data from the staging into the core dwh via merge statements in stored procedures. As input-parameter the StoreId will be used which will be mapped to variable v_StoreId. The following statement is an example of a merge statement

    MERGE CORE_TABLE_AB as t

    USING

    (SELECT * FROM STAGE_TABLE_A a WHERE StoreID = @v_StoreId

    INNER JOIN

    SELECT * FROM STAGE_TABLE_B b WHERE StoreID = @v_StoreId

    ON a.StoreId = b.StoreId

    ) s

    ON t.StoreId = s.StoreId

    AND t.UniqueId = s.UniqueId

    WHEN NOT MATCHED BY TARGET THEN

    (...)

    WHEN MATCHED THEN

    (...)

    This procedure will be executed for each of the 6 StoreId's. But for some of the StoreId's there are no data in the staging tables. And this lead to my performance problems i think. Because the execution plan would use a nested loop for the inner join of STAGE_TABLE_A and STAGE_TABLE_B for the StoreId's which has no data in it. And falsely the execution plan for the StoreId's with data in the stage tables uses then also a nested loop which leads to a long runtime.

    I've already analyzed the statistics of the tables. They are correct and if I display the execution plan in SSMS (it shows me the correct plans for the StoreIds without input data (nested loops) as well as for the StoreIds with data (hash match). But here I've replaced the variable @v_StoreId by the real StoreId.

    So do somebody of you know what I have to do to have different execution plans for the different StoreId's? Is the variable @v_StoreId the problem?

    Many thanks

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply