SSIS Advise.

  • Hi All,

    I would like to know, what would be the best practice here,

    My SSIS Package is very simple, but I would like to use/keep best practice.

    My Source is FLAT FILE (the data around, 2 million, Package runs once a day at 3:00 am)

    Destination is SQL Table.

    First Option:- Retrieve the data from flat file and do some transformation as needed and dump the data into SQL Table.

    Second Option:- Retrieve the data from Flat File and do some transformation as needed and dump the data into Staging SQL Table and then from Staging to Target SQL Table.

    My questions are

    1) Which option is best and why..

    2) Which Option is fast.

    Need advise/thoughts from SSIS Gurus.

    Thanks in advance.

  • What sort of transformations do you have in mind?

    Will you be doing just inserts, or inserts and updates, or inserts, updates and deletes?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Opps I should mention.

    I am using so far two transformations ( Derived Column, Data Conversion)

    Plus I am using SCD..

    That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)

    What you think?

  • rocky_498 (2/9/2015)


    Opps I should mention.

    I am using so far two transformations ( Derived Column, Data Conversion)

    Plus I am using SCD..

    That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)

    What you think?

    SCD is slow, don't use it.

    Can you estimate the % of INSERTs vs UPDATEs, approximately?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I would say, 70/30 (Insert/Update).

  • Here's how I would do it.

    Add a lookup to your package and use it to determine Insert vs Update.

    If it's an Insert, send it directly to the target table.

    If it's an update, send it to a (truncated) work table and run an update stored proc just after the dateflow to process those.

    Your main problem here is that your lookup cache needs to be in-memory for optimum performance. Millions of rows in your target table = lots of RAM needed.

    What sort of column will you be matching on when deciding whether to insert or update ... what does the data look like?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • rocky_498 (2/9/2015)


    Opps I should mention.

    I am using so far two transformations ( Derived Column, Data Conversion)

    Plus I am using SCD..

    That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)

    What you think?

    Dump your flat file data into a staging table. Then use merge to load data into your main table from staging table. This is the best and reliable way to achieve this.

    ____________________________________________________________

    AP
  • anshulparmar (2/20/2015)


    rocky_498 (2/9/2015)


    Opps I should mention.

    I am using so far two transformations ( Derived Column, Data Conversion)

    Plus I am using SCD..

    That would be next question. If I pick Second Option then SCD would be fast or Merge Statement ? ( My little exp says Merge)

    What you think?

    Dump your flat file data into a staging table. Then use merge to load data into your main table from staging table. This is the best and reliable way to achieve this.

    There are some issues with MERGE:

    Use Caution with SQL Server's MERGE Statement[/url]

    It's also hard to track how many rows you actually inserted and updated.

    Personally I use the Lookup pattern described by Phil.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Phil Parkin (2/9/2015)


    Your main problem here is that your lookup cache needs to be in-memory for optimum performance. Millions of rows in your target table = lots of RAM needed.

    What sort of column will you be matching on when deciding whether to insert or update ... what does the data look like?

    Depends a bit on the lookup columns used.

    I have used lookup transformations on data sets over 60 million rows without an issue, because I used integer columns.

    Do not try this at home with GUIDS though 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply