ETL Magic with SSIS – Part I

  • Comments posted to this topic are about the item ETL Magic with SSIS – Part I

  • My comment relates to this sentence:

    1) to fail the package if the file is not available, then execute the package again at a later time, or 2) we can wait for the file to be ready. EDIT- I have reconfigured the final sentence of this paragraph, as it was previously very hard to understand. Confirmation needed that my interpretation is correct

    There is no need to fail the package if the file is not present. You have the option of using a Foreach container to process the file if it exists, otherwise complete the package successfully. You may also need to set delay validation to true on the file connection (can't remember for sure - haven't done this for a while).

    It also seems that I am an SSIS purist 🙂

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I'm still learning SSIS, but this has a lot of good information and tips. Looking forward to Part II.

    Sigerson

    "No pressure, no diamonds." - Thomas Carlyle

  • The File Watcher Task is a new one for me; could be useful in future. At present, my ETL tasks run once or twice a day and as Phil says above, tests if files are present, loading them if so.

    Although I do use the Flat File Source component, I often have to replace it with a Script component written to deal with real files which often have bad rows or other oddities which confuse the standard Flat File Source. With Flat File Source, your imported file must always have the number of fields expected, and if comma-separated it can't have embedded quoted commas - e.g. a field like "19, The Strand" or """The Elms"", 4 High Street" will confuse it.

  • Thanks for this end to end example.

    For what it's worth, I did a similar exercise using generated data I modified in Excel. There are a number of generators out there, useful perhaps for creating test data for real test cases.

    http://www.generatedata.com/#generator is one.

  • I really enjoyed your article. I'm very new to the SQL Server scene and so this was very helpful. Much of what I've been reading insofar as SSIS and ETL talks about working from flat files. This includes your article. Is there any reason for this (i.e. best practices, performance, security...)? Reason I ask is because we're considering using SQL server to build mini data mart with transaction data that resides on 2 distinct Oracle databases. Are there reasons why we should load data using a flat file instead of database links if we know that all the servers/databases are sitting behind our firewalls? Database links seem like the easiest, most efficient way to doing this but, in the short time I've been researching this, I haven't read many articles about people using database links for this. Thanks Again!

  • You can definitely use SSIS to connect to different databases. We pull data from SQL, Oracle, DB2, and Informix. You can even utilize SSIS without using SQL as a source or destination.

    The normal reason for using text files in scenarios is the ease of set up and configuration. Everyone is able to create a text file on their system. While creating a table in a different database may be restricted.

  • Good article and I'm looking forward to the sequel ...

    Property "delay validation" is a very important note: before an SSIS package is run, is always a check. Without this feature, it is not possible, to create a table in the package and fill them with data - in one step. The search for validation errors can cost a lot of time... I know this 😉

  • Hey Ron,

    It's interesting that I came across your article when I did, as I'm currently cursing SSIS on a regular basis in my day to day worklife.

    I've found in practice that the SSIS workflow tends to be a lot less fluid than what it seems in any of these small demonstrations. I'm currently involved in a huge data conversion project where we are making use of SSIS for the majority of the conversion efforts, and I'm about ready to jump out the window.

    Even the simplest thing such as reading a flat file seems to fall apart in many real world scenarios. Take reading a large data file where the types cannot be easily determined by the first 10,000 rows, for instance - as that's the MAX that the flat file suggest types function will look at. So you're relegated to not making use of slick auto-magic features that are demonstrated, and rather stuck typing it all in by hand, assuming you know the max lengths and sizes you need for everything - if not you're stuck with trial and error, or using a third party program to do the analysis, and THEN typing it all in by hand.

    How does anyone else get around these kind of setbacks that appear all to common with SSIS? Or am I the only one using SSIS who's not getting paid by the hour?!? Inquiring minds want to know...

  • Thank you for a great article! However, the quality screenshots is absolutely horrible. I wonder if this is something that can be fixed, so the words in the pictures are readable.

  • Hello Kevin,

    when I have to import some unknown flat files, I change the max length of each column (string) to 8000 (see picture). After that, I can transform the data or migrate, as I need it. Now I have the possibility of a simple error handling.

    Unfortunately you need here two tables - one for reading and of course the real aim ...

  • Back in 2004 we looked into MS offerings in the EDI space and figured out that nothing (including BizTalk) was addressing our reliability requirements, so we started our proprietary software development and never looked back.

    Out of curiosity I looked into this article and OMG, a simple case of reliable FTP download (not sure it does SFTP or uploads) is not even a part of a framework? How on Earth in the real world are you going to deal with situations when the file has not fully arrived into the source folder yet or they put it there twice by mistake simply because you already picked it up before they saw it? How are you uploading the files to (S)FTP? Does that have a chance to break your client’s EDI because they fell into the same trap?

    For our software to run you can use Pentium 3 and no SQL server is required! It has agents and APIs that can be deployed to the client because some clients may be just desktop applications with no EDI capabilities. The other way for this sort of clients is sending attachments to by email.

    I understand that MS needs to sell a product that is extremely configurable, but reading this article it seems like even a simple ETL is quite a lot of work and workflows. The advantage of our proprietary software is that if something has to be a part of the framework, we put it there, so over the years we ended up with an extremely robust framework. That is also a reason why we have not productised it. Instead we provide a VAN. The rationale for our clients is that they need no hardware, employees with associated costs (including compliance costs), and time to the market are days, not months!

    Have a read at http://allelectronicmessages.com/B2B_Messages.aspx?r=s07

  • I am trying to follow this and am getting an error regarding my connection string. I have set the environment variable and set the package configuration to use it but I get:

    'The connection string format is not valid'

    It is the same connection string that is in the properties of my connection manager so I am not sure why it doesnt like it.

    Anyone else get this?

    Thanks in advance...

  • Thank you very much for the wonderful article.

    I am able to download and installed the File Watcher Task successfully but when I clicked the ellipsis path button in the file watcher task to take me to where my file is located, then I get the following error:

    Could not load file or assembly 'Microsoft.Data TransformationServices.Design, version=9.0.242.0, Culture=nuetral, PublicKeyToken=89845dcd8080cc91' or one of its dependences.

    The located assemply's manifest defination does not match the assembly reference.

    (Exception from HRESULT:ox8013040)

    Any help how to resolve this problem or is there any other task I can use as a replace?

    Thanks

  • Thanks a lot for the such a wonderful article.

    I am able to download and installed the File Watcher Task successfully but when I clicked the ellipsis path button in the file watcher task to take me to where my file is located then I get the following error:

    Could not load file or assembly 'Microsoft.Data TransformationServices.Design, version=9.0.242.0, Culture=nuetral, PublicKeyToken=89845dcd8080cc91' or one of its dependences.

    The located assemply's manifest defination does not match the assembly reference.

    (Exception from HRESULT:ox8013040)

    Any help how to resolve this problem or is there any other task I can use instead of the File Watcher Task?

    Thanks

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply