SSIS Design Pattern - Staging Delimited Flat Files

  • Comments posted to this topic are about the item SSIS Design Pattern - Staging Delimited Flat Files

  • I've often found it helpful to stage every column as a character type. Invariably something ends up where it shouldn't. For a small number of columns that can be reasonable to troubleshoot, but as the column count grows (or the rows for that matter) can be a real pain.

  • Hi,

    I've always made the data type changes in the connection manager of the flat file...

    This way, it's less work in terms of mapping in the OLE DB Destination Editor (proving column names are the same).

    Do you think there are any advantages/disadvantages of this method or is doing the data conversion better?

    Thanks,

    James.

  • Hi James, There're two reasons I use Data Conversion instead of converting data type in the source.

    First one is more of a preference than anything. Using a Data Conversion I can visaully see what and when data is being transformed. If this is in the source advanced page, I find it hidden and often forget about it when troubleshooting.

    Second reason is related to performance. I hope to talk more about this in my next design pattern when I shown fixed width files, but in general I think parsing inside the data flow is faster than directly in the source component.

  • Hello Sam,

    Thank you for the response, I agree that it is easier for troubleshooting with the data conversion in.

    I've not tested it for performance so I will be looking forward to your next article.

    Thank you,

    James

  • What parts of this process can't be done by just straight T-SQL scripts?

  • What about Text Qualifiers? I've never used SSIS and every flat file I've ever worked with has been comma or tab delimited and they utilize double-quotes as text qualifiers.

    The reason I ask, using the SSMS import wizard and selecting flat file as the source hasn't worked since 2005 because it completely ignores the text qualifier setting. Now, when I see a GUI interface from MS for data imports, I automatically assume it's going to mishandle this basic need. Does SSIS have this setting and does it work?

    EDIT: Actually, reviewing the screenshots, I see it has the setting so now I'm only curious if it works.

  • Robert.Sterbal (4/20/2015)


    What parts of this process can't be done by just straight T-SQL scripts?

    None of them. I suppose you can say the samething for pretty much all ETL operations that SSIS - or any other ETL tool - can do. The key is using what works for you. This pattern is for you, if you prefer using SSIS as I suspect there are a lot of folks out there who use SSIS.

  • thisisfutile - You bring up a good a point that I didn't cover in the article. As you can see in the screenshots, there's a setting to specify text qualifiers. I've used them extensively without any problems.

    I plan to provide an example in the next article when I demonstrate automating this pattern, until then take a look at this article[/url] on MSSQLTIPS that shows just that.

  • Sam Vanga (4/20/2015)


    thisisfutile - You bring up a good a point that I didn't cover in the article. As you can see in the screenshots, there's a setting to specify text qualifiers. I've used them extensively without any problems.

    I plan to provide an example in the next article when I demonstrate automating this pattern, until then take a look at this article[/url] on MSSQLTIPS that shows just that.

    Thanks Sam, that's encouraging. I was very frustrated when I discovered this bug back in 2005 and 10 years later I've just learned to live without it. I'll still use the import wizard from time-to-time but only use "Excel" files as the source.

    Now that I know this works in SSIS, I have some motivation to investigate this tool. Thanks for the reply and the link!

  • I agree with oscar.leeper's comment. I like to bring in the data to a stage table in the native format and then do conversions before inserting to destination. This makes troubleshooting easier in case there are errors with conversion. The stage table can be truncated at the end of the process after successful run, so storage size should not be a concern.

  • I don't disagree. I think these are two different ways to do it. I've used both of them. And I think I said it in the article, this step of data conversion is optional.

    However I don't like to truncate staging after loading to final destination. I want to do it before the next load. Reason: If I find problems even after the "successful" load, I want to have this data in staging for troubleshooting.

  • Great article Sam. Simple, to the point, easy read - well done sir!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • oscar.leeper (4/20/2015)


    I've often found it helpful to stage every column as a character type. Invariably something ends up where it shouldn't. For a small number of columns that can be reasonable to troubleshoot, but as the column count grows (or the rows for that matter) can be a real pain.

    I agree.

  • I haven't done this in a while. Thanks for the refresher.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply