SSIS Help!

  • Hi all,
    I have 2 large (85MB each) excel files need to load in sql server in one table. 
    These files are different in column numbers. one is less than others. But all the rest column name is same.
    The excel sheet name is same. 
    Till now, I was able to dump only 1 table using foreach loop container.
    Please help me.
    Thank you.

  • Lynn Pettis - Monday, December 24, 2018 10:52 AM

    sorry I wanted to rephrase the problem. will delete the previous one. thanks

  • jkabirm - Monday, December 24, 2018 10:55 AM

    Lynn Pettis - Monday, December 24, 2018 10:52 AM

    sorry I wanted to rephrase the problem. will delete the previous one. thanks

    Why rephrase it?

  • Lynn Pettis - Monday, December 24, 2018 10:56 AM

    jkabirm - Monday, December 24, 2018 10:55 AM

    Lynn Pettis - Monday, December 24, 2018 10:52 AM

    sorry I wanted to rephrase the problem. will delete the previous one. thanks

    Why rephrase it?

    I guess my problem was not clear. Could you please help me with that? I will remember next time. I am a new in this area. thanks.

  • jkabirm - Monday, December 24, 2018 11:02 AM

    Lynn Pettis - Monday, December 24, 2018 10:56 AM

    jkabirm - Monday, December 24, 2018 10:55 AM

    Lynn Pettis - Monday, December 24, 2018 10:52 AM

    sorry I wanted to rephrase the problem. will delete the previous one. thanks

    Why rephrase it?

    I guess my problem was not clear. Could you please help me with that? I will remember next time. I am a new in this area. thanks.

    I haven't written an SSIS package in a while.  If I needed to import excel files where they could meet one of two different specifications (number of columns) then I would have two different packages, one for each specification.

  • Your best bet is to create a separate data source and destination in SSIS for each Excel file.  Populate two staging tables, and then combine them in SQL either by creating a view, or by populating a final table.

    It's possible to dynamically change a source and destination in SSIS but frankly it's way more trouble than it's worth, especially for only two files.

  • cphite - Monday, December 24, 2018 12:37 PM

    Your best bet is to create a separate data source and destination in SSIS for each Excel file.  Populate two staging tables, and then combine them in SQL either by creating a view, or by populating a final table.

    It's possible to dynamically change a source and destination in SSIS but frankly it's way more trouble than it's worth, especially for only two files.

    thank you.
    I would also need to archive and Zip files with datestamp. Could you please guide me thoroughly.

  • You can start here:
    https://www.mssqltips.com/sqlservertip/4165/how-to-read-data-from-multiple-excel-files-with-sql-server-integration-services/
    This will show you how to create a loop that will look into a folder and import the file(s) it finds in that folder one at a time and then move the files to another folder...  

    https://www.tutorialgateway.org/rename-file-using-file-system-task-in-ssis/ 
    Will show you how to rename the files

    https://www.timmitchell.net/post/2016/12/15/zip-files-in-ssis/
    Will show you how to ZIP the files.

    Everything you need to know is on those three pages.  If you have specific question, you can ask them here; I'm kinda off and on for the next few weeks but chances are someone will be able to help.

    In your case, since you have multiple formats of files to import, I would suggest creating multiple folders - one for each format - and a separate loop for each folder.  This would be easiest.  I suppose you could also use a single folder, and have multiple loops based on something in the file names - for example one loop would do all the files that start with "format1" and another "format2" etc, it's slightly more work to setup but a bit cleaner in my opinion.

    Start with one loop, get it working, and then do the same thing for the second loop.  Once you do it once it's pretty easy to do it again 🙂

    You can have your loops run one after another (multiple dataflows) or they can run in parallel (one dataflow) - just be aware that if they're running in parallel you'll need to create separate variable names for each loop - otherwise you're going to get some *really* interesting results (interesting is bad)

    Finally, once the files are imported, you can create a step that runs the SQL script that combines the multiple tables you filled into one table via INSERT - or you can have a view on your database that combines the tables.  Whatever makes the most sense for your specific data.

  • If the source layouts are different then it doesn't appear to make sense to load them using a loop.
    I'd class them as different sources personally, loading them separately into the same staging table.

  • SSIS and Excel do not play nicely.  Can you get the data from the original source rather than from Excel.

    Excel will generate errors you can't program for in development.
    differences between 32bit and 64bit drivers
    bad data in cells because Excel does not enforce data types at column level
    Protected data breaks SSIS
    Merged cells break SSIS
    users changing column names or sheet names breaks SSIS

    SSIS have VERY strict data pipeline rules and Excel is prone to drift over time.  At least if you can get hold of the original data from a database or web api there should be less drift.

    CSV files would be better than Excel - I know it is an additional ETL step for the business but Excel should NOT be used as a data source, only as a target (IMHO)

    Aaron

  • aaron.reese - Thursday, January 3, 2019 6:40 AM

    SSIS and Excel do not play nicely.  Can you get the data from the original source rather than from Excel.

    Excel will generate errors you can't program for in development.
    differences between 32bit and 64bit drivers
    bad data in cells because Excel does not enforce data types at column level
    Protected data breaks SSIS
    Merged cells break SSIS
    users changing column names or sheet names breaks SSIS

    SSIS have VERY strict data pipeline rules and Excel is prone to drift over time.  At least if you can get hold of the original data from a database or web api there should be less drift.

    CSV files would be better than Excel - I know it is an additional ETL step for the business but Excel should NOT be used as a data source, only as a target (IMHO)

    Aaron

    That's not a bad point...  I've used Excel as a source and you do have to be pretty careful about it...  have to be strict about making sure the files are consistently formatted, data types are maintained, etc.  It works just fine if you're able to enforce all of those things - but it's almost guaranteed to break if you're not.

    That said, if the files are coming as Excel, you can always convert them via your favorite script language into CSV and make that part of the SSIS package.

  • I've seen alot  of people (even developers) refer to CSV files as Excel files because they have associated csv extensions with Excel.  Some of the questions the OP asked are basic requests (Rename a file/ zip a file) and can be readily googled if unclear.  Hopefully I'm wrong, but it seems to me that this is just throw it out there and give me the answer as opposed to doing a little leg work

    For better, quicker answers, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • If the portion you are loading from both files are the same, you can actually specify a range of columns in your Excel data source.  This would allow you to treat both file layouts the same by excluding the additional columns in the second layout.  

    To do this, you need to create the source based on the file with less columns, then right-click on the source and go to properties.  In the properties, The OpenRowset property will contain the worksheet name you specified in the source.  Add your range to the end of it.  For example, if your worksheet is named SomeData, the OpenRowset property would show SomeData$, which you would change to SomeData$A:F to read in only the first six columns.  Keep in mind that if you edit the source, you will lose your custom range and have to go back into properties to re-add it.  

    If you are using an expression to set the worksheet name, you will have to account for the range there as well.

  • Mike01 - Friday, January 4, 2019 9:59 AM

    I've seen alot  of people (even developers) refer to CSV files as Excel files because they have associated csv extensions with Excel.  Some of the questions the OP asked are basic requests (Rename a file/ zip a file) and can be readily googled if unclear.  Hopefully I'm wrong, but it seems to me that this is just throw it out there and give me the answer as opposed to doing a little leg work

    Could be...  I was giving them the benefit of the doubt since they specifically asked about importing multiple formats...  which is a common enough request that it's reasonable, but uncommon enough that one might ask for help...

    As long as multiple formats means "more than one static and consistent format" it's really not much more difficult than setting up a package to import one specific format.  It's basically just get the first one working and then copy the parts and update them accordingly.

    If multiple formats means "format can change at any time but we want it to just work" - well, that's one of those things that technically possible but far too big a pain in the @$$ to be feasible  😀

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply