Importing Excel Workbook with multiple tabs using SSIS

  • Hello All,

    I have recently started using SSIS, and have been exploring various transformations trying to make the best use of them. I currently have a requirement where in, the source data is in the form of an excel workbook. The workbook is refreshed on a daily basis. It contains multiple tabs, with different columns in each tab (only a couple of identical columns)

    For Example:

    Tab 1:

    =======

    Date|| CustomerID || CustomerVisits || Referral

    Tab 2:

    =======

    Date|| CustomerID || Entries

    Tab 3:

    =======

    Date || Referral || Unique Count

    .... and so on...

    I need to load them into the same SQL Table, which has the following structure:

    SQL_TABLE1:

    ===========

    Date

    Customer ID

    CustomerVisits

    Referral

    Entries

    UniqueCount

    ...etc...

    So the SQL table contains all the required destination columns. However, I'd like to know how to add another column to the table which will allow me to identify which sheet the data came from. For example, I want to have a 'SourceName' column which will have values like: Tab 1, Tab 2, Tab 3, etc.. which are essentially the excel sheet names.

    I came across a few options to do this, like ForEachLoop containers (does not work for my requirement), Script task at control flow level, Script source at data flow level, Derived column at data flow level.. I'm not well experienced with any ETL software so not quite sure what would be the best way forward. I would appreciate any ideas or directions

    Thanks & Regards

    Dhivya

  • Dhivya Elan (4/20/2016)


    Hello All,

    I have recently started using SSIS, and have been exploring various transformations trying to make the best use of them. I currently have a requirement where in, the source data is in the form of an excel workbook. The workbook is refreshed on a daily basis. It contains multiple tabs, with different columns in each tab (only a couple of identical columns)

    For Example:

    Tab 1:

    =======

    Date|| CustomerID || CustomerVisits || Referral

    Tab 2:

    =======

    Date|| CustomerID || Entries

    Tab 3:

    =======

    Date || Referral || Unique Count

    .... and so on...

    I need to load them into the same SQL Table, which has the following structure:

    SQL_TABLE1:

    ===========

    Date

    Customer ID

    CustomerVisits

    Referral

    Entries

    UniqueCount

    ...etc...

    So the SQL table contains all the required destination columns. However, I'd like to know how to add another column to the table which will allow me to identify which sheet the data came from. For example, I want to have a 'SourceName' column which will have values like: Tab 1, Tab 2, Tab 3, etc.. which are essentially the excel sheet names.

    I came across a few options to do this, like ForEachLoop containers (does not work for my requirement), Script task at control flow level, Script source at data flow level, Derived column at data flow level.. I'm not well experienced with any ETL software so not quite sure what would be the best way forward. I would appreciate any ideas or directions

    Thanks & Regards

    Dhivya

    I would say that the Foreach loop should work. Please explain why you think it will not.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Dhivya Elan (4/20/2016)


    Hello All,

    I have recently started using SSIS, and have been exploring various transformations trying to make the best use of them. I currently have a requirement where in, the source data is in the form of an excel workbook. The workbook is refreshed on a daily basis. It contains multiple tabs, with different columns in each tab (only a couple of identical columns)

    For Example:

    Tab 1:

    =======

    Date|| CustomerID || CustomerVisits || Referral

    Tab 2:

    =======

    Date|| CustomerID || Entries

    Tab 3:

    =======

    Date || Referral || Unique Count

    .... and so on...

    I need to load them into the same SQL Table, which has the following structure:

    SQL_TABLE1:

    ===========

    Date

    Customer ID

    CustomerVisits

    Referral

    Entries

    UniqueCount

    ...etc...

    So the SQL table contains all the required destination columns. However, I'd like to know how to add another column to the table which will allow me to identify which sheet the data came from. For example, I want to have a 'SourceName' column which will have values like: Tab 1, Tab 2, Tab 3, etc.. which are essentially the excel sheet names.

    I came across a few options to do this, like ForEachLoop containers (does not work for my requirement), Script task at control flow level, Script source at data flow level, Derived column at data flow level.. I'm not well experienced with any ETL software so not quite sure what would be the best way forward. I would appreciate any ideas or directions

    Thanks & Regards

    Dhivya

    Use a Derived Column in your data flow. That's all you need..

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hi Phil,

    I tried using the ForEachLoop container. Here's what I did:

    1) Made all my Excel Source tabs identical, by adding missing column headers (although they might not contain any data). This means that, if there are a few columns in my 3rd and 4th tabs, which dont exist in my first tab, then I added just the headers in my First Tab. I left the data empty. By doing this, all my tabs had the same number of columns and same column header names.

    2) I then implemented the ForEach Loop container to import the data in a loop from each of the tabs

    3) Problem I faced: On my first source tab, there are few column headers with no data. So excel samples the first few rows and assigns string datatype to it. But the same column in the 4th or 5th tab is actually Numeric as it has some number values in it.

    So the routine fails when it is unable to match the data types.

    I am unable to change the datatype in the source because the source has an add-in which actually deletes the content of the excel and reloads it on a daily basis.

    Regards

    D

  • Hi John

    I've been trying to use the derived column transformation, unable to find a guide on how-to, that matches my requirement. I would really appreciate any helpful links.

    I understand that I would have to create a variable that would import my source sheet names

    and then in the derived column expression, I need to assign my sheet name. Please correct me if I'm wrong.

    I am not sure how to create a variable to get the names of my sheet, and if I need to store it in some temp table or not.

    Regards

    D

  • Dhivya Elan (4/20/2016)


    Hi John

    I've been trying to use the derived column transformation, unable to find a guide on how-to, that matches my requirement. I would really appreciate any helpful links.

    I understand that I would have to create a variable that would import my source sheet names

    and then in the derived column expression, I need to assign my sheet name. Please correct me if I'm wrong.

    I am not sure how to create a variable to get the names of my sheet, and if I need to store it in some temp table or not.

    Regards

    D

    Create the variable.

    Assign the value of the sheet name being processed to the variable from within the config for the Foreach container.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Dhivya Elan (4/20/2016)


    Hi Phil,

    I tried using the ForEachLoop container. Here's what I did:

    1) Made all my Excel Source tabs identical, by adding missing column headers (although they might not contain any data). This means that, if there are a few columns in my 3rd and 4th tabs, which dont exist in my first tab, then I added just the headers in my First Tab. I left the data empty. By doing this, all my tabs had the same number of columns and same column header names.

    2) I then implemented the ForEach Loop container to import the data in a loop from each of the tabs

    3) Problem I faced: On my first source tab, there are few column headers with no data. So excel samples the first few rows and assigns string datatype to it. But the same column in the 4th or 5th tab is actually Numeric as it has some number values in it.

    So the routine fails when it is unable to match the data types.

    I am unable to change the datatype in the source because the source has an add-in which actually deletes the content of the excel and reloads it on a daily basis.

    Regards

    D

    Within your foreach loop, you need multiple data flows (probably one for each sheet).

    Add logic to direct the flow to the relevant data flow, depending on the 'current' sheet.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I'm confused as to why you need a for each loop to read in one file. You said that you have a new file each day. Where the multiples come in is that each file has multiple tabs. This should be represented by multiple data flows, not by looping w/ a For Each container. I recommend the following:

    1. Create a data flow for each tab.

    2. <In Data Flow>Source Component to get columns for the tab you are working

    3. <In Data Flow>Derived Column Transformation to add in a column to signify the tab you are working

    4. <In Data Flow> Destination column for table.

    I think you said you'd need to add in the missing columns for each tab? Is this just to make the tab have the same columns as the detination? Would these all be NULL, or where would you get values? Either way, you can add the columns in via the Derived Column transform in step 3.

    Am I missing something?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • HI John,

    I only added the missing columns to each tab, to test the ForEachLoop container. But then I encountered some problems. So I dropped that idea.

    Currently, the solution that I have is very similar to what you have suggested:

    1. Create a data flow for each tab.

    2. <In Data Flow>Source Component to get columns for the tab you are working

    3. <In Data Flow>Derived Column Transformation to add in a column to signify the tab you are working

    4. <In Data Flow> Destination column for table.

    This does not have any NULL columns.

    Could you please help me with Point number 3? How should I create a variable which will get me the names of the excel tab please?

    Regards

    D

  • If you have a separate data flow for each tab, you don't need to store them in a variable because they don't change. Just add the value (DT_STR,50,1252)"Your Tab Name Here" as a new column in the Derived Column Transform. Adjust the data type and length to match your tab name column.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hi John,

    This works, thanks for your help.

    Regards

    D

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply