The SSIS Data Pump - Level 2 of the Stairway to Integration Services

  • Comments posted to this topic are about the item The SSIS Data Pump - Level 2 of the Stairway to Integration Services

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Andy,

    The most terrific pair of articles I've read in a long time. I was particulary impressed with the way you handled options and errors. Most all of the other SSIS articles out there seem to follow a single best-case process flow explanation without touching on all the possible options or errors. You seemed to cover these quite well without getting bogged down in too much explanation or diversion from the main thrust of the topic. I cannot wait until the next articles in this chain are posted.

    This last brings up the only confusion I have as this is February 20, 2011; yet you have dated your Step 2 article with a date of 2011/04/06, which I believe may have something to do its future appearance date in the main SQLSeverCentral newsletter issue, even though it is already part of the Stairway series. However, I sure hope I don't have to wait that long in seeing the followup articles posted here.

    But, again Terrific articles on clearing up mysteries of SSIS for those of us just starting out in using it professionally. Also, I am shocked that I am the first to rate it and comment on it!

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Andy this is brilliant. It deals with "how to" in a way an novice like me can understand, whilst showing us how to do something the real world might actually want us to do.

    Keep it up? Please?

    MIKI

  • Great job!!! Really, really looking forward to the next one. Is it done yet? :Whistling:

  • GReat Job Andy....... Really helpful for newcomers from starting..... πŸ™‚

  • How about something that occurs frequently in the business world - loading extenral data from a csv, txt or xls file? And please explain the process of creating the staging table. My requirement may involve a fresh staging table each time (DROP/CREATE or TRUNCATE), or I may want to append data to an existing table.

    I'm also interested in learning best practices for the transform stage. Should the initial code load the data "as-is" from the csv and then run a SP to execute the field manipulation and transform logic, or should that logic reside in SSIS code. How does SSIS-centric code remain maintainable and in a central repository (other than msdb)?

    Finally, I'm interested in error reporting and correction strategies. If the bulk insert fails on a few rows what is the recommended procedure for identifying and presenting these rows to the user for correction? How should the input file be archived post ETL?

    I strongly urge the author to work through a "real world" ETL scenario from an externally sourced CSV file(s), FTP'd into a local server folder, through data load, transformation, error correction and reporting, archival and balancing. This is a fairly standard scenario that all IT professionals confront on a frequent basis.

    Thank you.

  • Great writing .. I am loving to read this....

  • Lovely article

    One extra tip about formatting the layout.... the first component you select will be used as the source for the other components new placement on the data flow canvas.

    So if you have 5 components to realign select the one which you know is in the right place, then ctrl+click (or ctrl+drag select) to select the others then perform the highlighting operation.

    This is very useful when you've got quite a few components which need arranging relative to each other but not all at once.

    I do take a slight issue with your conclusion that it covered at lot of features... it barely scratched the surface in my opinion but I hope you'll be covering the rest in later articles!

  • Great articles and timely (for me!). One question though, for the OLE DB Destination fast load, are there any recommended values for "Rows per batch" and "Maximum insert commit size" or are they better left as defaults?

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks everyone for reading the article, the helpful suggestions, and kind words!

    Roland: I have plans (and outlines) to show loading from several sources. Excel and flat files are included. Your question about transform strategy is interesting because the answer varies from use case to use case (a fancy way of saying "it depends"). There are use cases for ELT, for example, where I use SSIS to extract the data from a source and load it to the destination, and then perform the transformations using T-SQL executed by Execute SQL Tasks. I'll cover that, either here or in an upcoming article on Data Warehouse Development (please stay tuned). Stage table creation strategy is largely driven by the client, culture, and data environment. I often employ the technique demonstrated in this article: using the New button beside the Name of the Table or View dropdown in the destination adapter. I recommend copying the Data Definition Language (DDL) statement from the Create Table window and storing it in a .sql file - something I failed to cover in the article. The .sql file can then be imported into the SSIS project (under Miscellaneous) and source-controlled right alongside the SSIS packages. At deployment time, the .sql files are executed and the packages deployed.

    Samuel: There is more to come. πŸ™‚

    Mark: There is often a balance to be struck between speed and concurrency. Batch and Commit size settings for fast loads are going to vary based on many factors (another fancy way of saying "it depends"). Fast loads run much faster than non-fast-loads, but they also lock the table differently. You can use the properties to manipulate the load in a way that achieves the balance (between speed and concurrency) you seek.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Andy,

    Why have you referred to it as a Data Pump?

  • I have not actually read the article yet. I'm just asking about these paragraphs that seem to be a little jumbled.

    First, SSIS will never store or save decrypted password fields. If your connection requires a password, and you check the Save My Password (or Save Password, depending on the version of SSIS you’re running) checkbox, and your SSIS package is encrypted.

    SSIS will store your password internally and encrypted if you use Windows Authentication, the Connection Manager will connect to the database in the context of the user who executes the package.

    I think that you really meant to say:

    First, SSIS will never store or save decrypted password fields. If your connection requires a password, and you check the Save My Password (or Save Password, depending on the version of SSIS you’re running) checkbox, and your SSIS package is encrypted, then SSIS will store your password internally and encrypted. If you use Windows Authentication, the Connection Manager will connect to the database in the context of the user who executes the package.

  • Jamie,

    Uh oh. What's wrong with referring to the Data Flow Task as a data pump?

    John,

    Yep - I missed that. You worded it correctly.

    :{>

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Andy Leonard (4/6/2011)


    Jamie,

    Uh oh. What's wrong with referring to the Data Flow Task as a data pump?

    What other than "that's not what it is called"? Nothing, nothing at all! πŸ™‚

    Sorry, I'm a bit of a terminology zealot - I just think it could be risky using wrong terms because then people start to misuse them and that's where communication breakdowns begin (I speak as someone that has spent a lot of time on forums trying to understand what people *really* mean) πŸ™‚

  • Thanks Jamie,

    I think you're being prudent and that I was sloppy with my terminology. It's ok to make analogies, but it's best to keep them out of the title of a major article. Good point. I will avoid this going forward.

    Thanks!

    Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

Viewing 15 posts - 1 through 15 (of 66 total)

You must be logged in to reply to this topic. Login to reply