temp table vs data flow task on physical table

  • Here is the scenario

    I have one staging table for csv file which is My source I am loading it into physical staging table I will be doing transformations on this staging table data in later part of package I need fresh data (as it is from source)

    Should I do transformation in temp table or should I use dataflow task again to reload staging table

    The data isnt more 🙂 just less than a million only

    ------------------------------------------------------------------------------------

    Ashish

  • The more work you do in SSIS before you load the data into the staging table the less transaction log activity you will generate.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • thnks opc

    Actually My question is I will be loading the csv int staging right at start as the package need it multiple times

    This package is not just ETL but a business process so I wanted to save server space and want to do it faster and readable

    ------------------------------------------------------------------------------------

    Ashish

  • This is stating the obvious but your question seems to be a bit unclear, so here it goes.

    If you need the same data multiple times then do not reload the file each time you need the data unless there is a chance the file will be changing during that time. The less times you load the csv file into the initial staging table the faster your process will be overall.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • yes I need the same data

    The file wont change

    But loading it once and doing transformation not in staging table (so will have to do in temp table) will consume server space (RAM) and makes it unnecessary complex

    As I mentioned before data isnt much so does multiple data flow still a bad option considering the whole case

    ------------------------------------------------------------------------------------

    Ashish

  • I still don't know which two approaches you're comparing. Maybe if you explained your end-to-end process it will be more clear as to which two approaches you're having doubts about.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • the two approach are

    do transformation in temp table

    or

    use dataflow task again to reload staging table after dong transformation on its data and loading it into destination table for new module

    I need fresh data from source csv file for every module of My package

    ------------------------------------------------------------------------------------

    Ashish

  • If you need fresh data then read the file each time in separate Data Flow Tasks, do the transformations you need and send the data downstream. This keeps most of the work in SSIS.

    If you want to load the file into a staging table that you do not modify once it is loaded you can use that as the starting point of further transformations but those will all be done in T-SQL.

    You will most likely be best off doing data cleansing and data type transformations in SSIS because there is no transaction log in that space and it can be parallelized in memory. Once your data is cleansed you can load it to your downstream systems.

    If you're trying to get the best performance at every step of your process there is no way around testing all the different approaches with your actual data to find the best one.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply