More or Fewer Data Tasks?

  • I'm not sure what sort of patterns SSIS handles best.

    Fewer Data Tasks? The fewest needed to get things grouped properly? Or more Data Tasks that really group certain tasks together and separate others?

    One advantage I see is also that each Data Task is an executable in terms of SSIS and you can execute them individually. Maybe it just makes visual sense to me to be able to micro manage things. Let the smaller tables get onto the next step because their Data Task is done while another huge table is still chugging away.

    How do people handle this? Are there performance considerations either way?

  • Like most questions database related, it depends. There is a blend for every situation, and that blend will depend on the amount of data involved, the resource intensity of your tasks (lookups, scripts, etc.) and contention.

    I had a large ETL package recently that needed to move some 50-60 million records, retrieved from 40 or so flat files, with a ton of transformations including lookups, scripting, derived columns, etc. I started with just a few data flows but quickly realized that performance and ease of editability (you can only get so many controls on a single screen) would require me to break those tasks apart into more data flows. In the end, having more data flow objects that each performed a smaller chunk of worked out much better by reducing contention and disk IO, and was much easier to read since I was able to logically arrange many of the processes.

    hth,

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Yes, thats about my story, only minus the huge import.

    Small import - only about 20 tables and not a whole lot of rows, but data that could change in the current application and we need to check for new rows or changes at least daily. DerivedColumns and Lookups abound.

    Originally I was going to have one DataTask to load into Stage with some transformations, then another DataTask for every level of dependency(1 group with no dependency, another group defendant on group 1, etc).

    Turned out to be far easier to organize some of the dependencies that were deeper to have each table have two DataTasks (one to load to stage, the other to load to production and perform last minute transformations from dependencies)

    Just this SSIS thing is new to me. I'd be done already having done it in TSQL, but there are gains to be made in SSIS once I get past the learning curve and really understand what it can and can't do better. So thats why I'm asking so many odd questions. I don't want to just get the job done, but get it done using SSIS's benefits and learning a best practice practical approach to this.

    I will admit having this set up visually in so many data tasks on the screen, I'm able to easily track visually what is loading and what step is holding up some dependent tasks.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply