Data Transfer AS400 --> SQL Server 2005 using SSIS

  • I'm looking for someone with a lot of SSIS knowledge to give me their opinion on the best way to handle a scheduled data transfer from an AS400 file to a SQL Server table. In a nutshell we have "Master" data that is maintained on our AS400, and we would like to be able to use this data in a SQL Server database as well, so we'd like to have nightly updates from the AS400 to our SQL table to synch up the data. I'm trying to accomplish this with an SSIS package.

    I'm new to SSIS packages, and sure I could figure something out on my own, but I'm interested in hearing opinions on best practices so that I might be able to avoid some rookie pitfalls.

    I have been able successfully create an Ole DB Source for the AS400 that I want to pull data from.

    Thanks!

    David

  • Honestly, you covered the hard part - getting an OLEDB driver to work with your AS400 can be a pain.

    As far as copying the data, it will depend on the amount of data, if you can easily determine which records have changed on the AS400 side, and how long you have to process.

    In simplest form, if you have a relatively small number of records (< 1M) you can simply truncate the destination table and re-populate it.

    I like to avoid doing this as it creates down-time, so I usually try to insert new / update existing.

    If you have a very small number of records (< 1000), the SCD wizard will basically compare a data set to your table, insert new records, and update existing records (pretty nifty, but slow on big data sets).

    If you can determine new and changed records on the AS400 end, just get those records and either stage the data and compare and update your table with T-SQL, or go get the TableDifference component from http://www.SQLBI.com (it's free) and do the comparison in SSIS (this works pretty well).

    Finally, if you cannot tell which are new, updated, or the same, and you have a lot of data, I would suggest a staging table - pull all of the data, and use T-SQL to compare it and update / insert / delete as needed. It will perform the best out of the options available.

    A couple of things to remember in SSIS - it is very data type sensitive so you need to make sure your data types are the same in any comparison. It is also case-sensitive and accent-sensitive in all of it's comparisons (although the TableDifference component will do non-case-sensitive compares if specified). It also does not automatically ignore trailing spaces in join operations.

  • Thanks for the response Mr. Earl.

    You mentioned an SDC Wizard. What does SDC stand for? Is that a destination control of some kind?

  • Sorry, I guess it was SCD.

  • Slowly Changing Dimension? I'll look into that.

    Thanks again!

    David

  • Yes, the Slowly Changing Dimension Transformation component.

    Although designed for dimensions, it will work for any "insert or update" situation.

    It is really a wizard and a component that compares a data set to a table. Like I said, it is slow for anything more than about 1000 rows, but it is really easy to use for small tables.

  • Great posting Michael.

    Quick question though, is a column derivation a lot slower than T-SQL when comparing for changes?

    For example, I have a table with about 15 fields that I am using column derivations to compare for changes.

    Sample of one of these column derivations:

    !ISNULL(dest_KeyField) && !ISNULL(source_KeyField) && ((ISNULL(source_CompareField) ? "" : source_CompareField) != (ISNULL(dest_CompareField) ? "" : dest_CompareField)) ? 1 : 0

    If I wrote the TSQL on this, it would look something like the following:

    case when dest_KeyField is not null and source_KeyField is not null and isnull(source_CompareField, '') <> isnull(dest_CompareField, '') then 1 else 0 end

    Either would work, but I have 2 questions on this one:

    a) What would be faster?

    b) What is the best practise in this case?

    Being relatively new to SSIS, I went for column derivations, assuming that these would be faster as they would be handled in memory, and also as I needed to understand the equivalent of isnull and case statements in SSIS. I pretty much figured out that using a SCD wizard may slow things down considerably.

    At the moment, I am using our smallest files to do the SSIS pioneering work before getting to those monster customer files.

    I would like an approach that I can use from the tiniest file to the biggest monster file.

    In my case, I dont trust source at all, as their compare utilities have dropped us once too often, so I am adament to do the compare on my side.

    What would your advice be (TSQL or column derivation)?

    Looking forward to your response

    ~PD

  • We use triggers to capture the keys on new / changed / deleted records on the AS400 side.

    This minimizes the amount of data we need to pull over to sql as we then can do just the revisions.

    Check into Remote Command for iSeries. We have AS400 processing that when finished triggers our package to run on the SQL server.

    Greg E

  • Ok, in your example for the column compare in SSIS vs. T-SQL, your comparisons are a bit different. The T-SQL comparison will be based on your collation settings and they are probably case-insensitive. The SSIS comparison will be case-sensitive.

    As far as the comparison itself, overall performance will depend on a number of factors, but the comparison itself will be FASTER in SSIS than in T-SQL. Since the comparison you are doing will result in a row-by-row scan of your table in T-SQL, the comparison is on an individual level and SSIS is better at these than T-SQL. Now, SSIS may have to move data around a bit more to get to the comparison, so you will probably end up slower there.

    As far as a complete solution, download and try the TableDifference component. It works pretty well and it is really the combination of a MERGE JOIN component and a CONDITIONAL SPLIT. If it does not have exactly what you want, the source is included so you could modify it, or switch to a MERGE JOIN and a CONDITIONAL SPLIT which works pretty well as a solution.

  • Thanks Michael.

    I have downloaded TableDifference and am planning to sample it within the next 2 days.

    For now, I have a working version of 3 really mean SSIS end to end solutions all working with column derivations as opposed to TSQL.

    It was a long and windy road to get these done, so I am going to implement them as a first release with a view to changing in the near future.

    Once I have worked out how tabledifference works, I will do a compare using the same data, but using TSQL, versus derivations versus tabledifference. I will compare across databases with 3 different collation settings, and log the results.

    Once again, thank you very much for the feedback

    ~PD

  • for the schedule it run daily....

    may i know how to make it as a schedule job,

    i m using ssms to create this job, but i face a lot problem while doing.

    i m using SSMS 2005 standard edition

    may anyone help me for this problem??

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply