SQL Newbie with some (hopefully) simple SSIS questions.

  • Greetings. I hope someone can point me in the right direction as I'm new to SQL in general and SSIS in particular:

    In a nutshell, I want to know the most straightforward way to gather and store data into a central db for reporting, trending, warehousing, etc. I've been researching this for a while, and it looks like SSIS with an SCD task might be the way to go. However, before I dig too deeply into it, I want to ask some experts about pitfalls.

    We are a manufacturing facility and we have four machines on the production floor that record things like weights, downtime, changeovers, etc from the production line. These machines each run a proprietary software that writes to a local access 2007 database (I have budget to upgrade them to sql express if necessary).

    I would like to compile this data centrally on our SQL server, and I would like it to be updated every 10-30 minutes. Ideally, I would like to be able to clean out the access databases periodically without having the deletes reflected in the sql server db.

    We run SQL server 2008 r2 sp1.

    Thanks for your time.

    Bob

  • Hi Bob,

    The SCD component in SSIS can have significant performance issues, especially where large data volumes are concerned and there are a lot of changes going on as the Type 1 attribute maintenance portion of the task runs a RBAR update.

    I personally wouldnt recomend this for any dimension over about 10K rows, or where more than 5% of the rows change in anyone iteration.

    Having said that there are a number of customised SSIS components out there, especially on Codeplex, some are better than others, the Todd McDermid component is not bad but it can be a bit of a blocking component. See http://toddmcdermid.blogspot.co.uk/2010/01/kimball-method-slowly-changing.html#!/2010/01/kimball-method-slowly-changing.html, the link to the component on codeplex is http://dimensionmergescd.codeplex.com/.

    The down side is that you have to install it on each Server and development PC you aim to use.

    Hope this helps.

    Edit:

    You might want to look at a setting up CDC enabled on the source tables so that you can just go in and get any changes since the last run.

    Also with the 30 minute refresh in mind, I would recommend using TSQL SP's to do the grunt work as you will get better overall performance than with SSIS.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • It might be worth picking up a copy of 'The Microsoft Data Warehouse Toolkit' from the Kimball group. It will give you a good start on the theory side of things and options as to how best to make use of the SQL server tools available.

  • BarneyL (1/2/2013)


    It might be worth picking up a copy of 'The Microsoft Data Warehouse Toolkit' from the Kimball group. It will give you a good start on the theory side of things and options as to how best to make use of the SQL server tools available.

    +1 to that, I would also suggest the following from the same series.

    ETL toook kit : http://www.amazon.co.uk/Data-Warehouse-ETL-Toolkit-Techniques/dp/0764567578/ref=sr_1_4?s=books&ie=UTF8&qid=1357118602&sr=1-4

    Dimension Modelling : http://www.amazon.co.uk/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247/ref=sr_1_1?s=books&ie=UTF8&qid=1357118822&sr=1-1

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply