Process Architecture Optimization

  • I have this scenario where we have som huge set of data in Sybase. Now there is a procedure which contains some heavy processing on that set of data based on certain parameters and finally store that output to a temporary table. Later, in the same session, that whole processed data in the temporary table gets transferred to our SQL Server tables by java code in small chunks say 50k rows per page.

    Now this whole thing takes lot of time. I have been assigned to optimize this whole architecture. I was just wondering if this can be done using SSIS ? Important thing here is that the data processing procedure runs parallely to max 10 connections at a time.

  • Kickoff a proc in Sybase and then copy some data from a table to SQL Server when the proc is done? Sounds like an easy job for SSIS. The hardest part of it may actually be finding good Sybase drivers that work with SSIS and setting up your connections to retrieve the data.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks. Apart from drivers, how it works parallely ? Say If i create a SSIS package. Will the same package work in parallel if invoked with different set of parameters at the same time ?

  • sqlnaive (1/16/2013)


    Thanks. Apart from drivers, how it works parallely ? Say If i create a SSIS package. Will the same package work in parallel if invoked with different set of parameters at the same time ?

    I don't think that should be an issue, unless you are writing to the same table.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • An SSIS package is just a set of instructions. If you kick off dtexec.exe three times in three different cmd shell windows and all call the same package then you'll have three instances of the package running at the same time.

    SSIS supports parallel processing by default both internally in some of the Components as well as in separate components not connected by precedence constraints. In the example below Data Flows 1, 2 and 4 will immediately start executing in parallel when the package starts. Data Flow 3 will not start executing until Data Flow 2 completes successfully:

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Fortunately or unfortunately, it is the same table. That was also in my mind. But I don't understand that how it is not giving any issue currently as it is writing to same table parallely.

  • sqlnaive (1/16/2013)


    Fortunately or unfortunately, it is the same table. That was also in my mind. But I don't understand that how it is not giving any issue currently as it is writing to same table parallely.

    Which option have you specified in the OLE DB Destination.

    There's the option to use table locks, so if this is off, you might have less concurrency.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • sqlnaive (1/16/2013)


    Fortunately or unfortunately, it is the same table. That was also in my mind. But I don't understand that how it is not giving any issue currently as it is writing to same table parallely.

    It depends on how they are writing rows. Committing 50K at a time, if they are added to a heap or the tail end of the clustered index then having 10 processes doing that concurrently might cause some intermittent blocking but could increase overall throughput significantly.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply