Blog Post

Convert EBCDIC to ASCII in SSIS

,

    Converting EBCDIC to ASCII in SSIS 2008 can seem like a daunting task. There is a lot of manual work involved in creating a data flow to move data from an EBCDIC system like AS400 or DB2, to an ASCII system like a SQL database. This can be accomplished, and it can be done with no script tasks or script components. Just using the native built in components with the proper set up will convert EBCDIC to ASCII. This is true even if the file has Packed Decimal (Comp-3) Fields.
    The first thing you need is a definition of the EBCDIC file. You will need to know where the columns are at in the file and the column widths. You will also need to know the type of data in the columns such as Packed, Regular, or Zoned.  The File in this example is 1100 characters across. You will only define a few columns to simplify the example.
    Here is the column layout of the EBCDIC file:
 image
    You will need to create a Flat File Connection in the connection manager of your package. Set the Format to Fixed Width and the Code page to the proper code page of your EBCDIC file. The most popular in the US is 1140 IBM EBCDIC US/Canada 37 + Euro.

image

    Click on the Columns node in the Flat File Connection Manager editor. This is the long tedious part of the process. You have to map every field manually. That is why you need the definition of the file to tell you where the columns are and the data types of the columns.
    The First Step is to set the Row Width at the bottom of the Editor. Then you will need to click in the column screen to add the column dividers. This need to be checked and double checked because it cannot be altered later. If you make a mistake here and close the connection manager, you must click reset columns and start all over again. The editor does not let you make changes to the columns after it is saved. Hopefully this is changed in a future release.

 image
    For the Regular fields you will see the data showing normally. In the Packed Decimal Fields you will see funny characters, and the zoned fields will show characters also. If everything is strange characters then you might have the wrong codepage selected. Go back to the general page and try changing the codepage if the data is all unreadable.
 
 image

image
    After setting the columns you will need to set the columns types and name the columns. Click on the advanced node in the Flat File Editor and Select the first column. In this example you are only defining five of the columns. So there will be columns between each column you are ignoring. Each column that is Regular or Zoned data set the data type to Unicode String. For the Packed Decimals set the data type to Byte Stream and the output width to a larger number than the input to handle all of the extra data in the packed field. Double the size should be enough but it does not hurt to increase it a little more to be safe. The packed field width in this example is 5 and the output is 50.
 image
    Click on the Preview node in the editor and your preview should show readable data in all the fields except the package decimal (Comp-3) fields. In this example the Invoice date and the Invoice Amount are packed decimal and therefore have the strange charters showing. Click ok to save the connection manager.
 image
    In a data flow drag in a Flat File source and select the newly created connection manager. In this example you unchecked the undefined columns in the columns node of the flat file source.  This way you are only dealing with the columns you care about in the data flow.
 image
    You need to make one more change to the flat file source. Right click on the Flat File source and select Show Advanced Editor. Click the Input and Output Properties tab on the top right. Click the plus to expand the Flat File Source output and the Output Columns. Select the packed decimal fields and change the UseBinaryFormat property to True. Now click Ok to save the source.
 image
    Below is the data viewer right after the Flat File Source. You can see the hexadecimal fields coming through and they are readable. The Vendor Name is readable and the rate unit is a number with a trailing character.

image
    The next transform will be a data conversion transform.  The fields that are packed decimal (comp-3) can now be converted to a Unicode string. This will create a new column for each field. These new fields will be in text format and can be edited with derived columns transforms.
 image
    Below is a data viewer image showing the data after the data conversion transform. Notice the hexadecimal value in the original columns and the string values in the new converted columns. These new columns can now be altered with derived columns to convert the dates and money amounts before they are written to a SQL database.

image
 
    The next transform will be a Derived Column transform. You will use this to split the data in the Unicode string columns. The packed decimal fields have a string of numbers and a single character at the last digit. This digit indicates the sign of the number.

C = Signed Positive
D =Signed Negative
F = Unsigned

    This first derived column is simply going to split the sign character from the numbers. The date field has this sign character, but it is not used, so all you need from that column is the number. The RateUnit column is a Zoned Decimal so you will need to split the last character from that also. Here are the expressions used to accomplish this task. You also multiple the amount field by the proper number of decimal places that are defined in the field. In this example it has 2 decimal places.

(DT_I4)(SUBSTRING(wstrInvoiceAmount,1,LEN(wstrInvoiceAmount) – 1)) * .01
SUBSTRING(REVERSE( [wstrInvoiceAmount] ),1,1)
(SUBSTRING( [wstrInvoiceDate] ,2,LEN( [wstrInvoiceDate] ) – 1))
SUBSTRING(REVERSE(RateUnit),1,1)
(SUBSTRING(RateUnit,1,LEN(RateUnit) – 1))
 image
    From this derived column you get four new columns, the number from the amount and rate, and the sign character from the amount and rate. The date column has the leading zero and the ending character removed from the date string.
    In the next derived column transform you will set the sign for the amount and rate column and convert the string date to the date data type. Below is the code to accomplish this. In this code you are checking the sign character for the letter “D” on the amount. If it is “D” then you multiply the amount by negative one, else you leave it positive. For the Rate, which is a zoned digit, you check the letter for the letter A-I and the curly bracket {, who indicate a positive number, and I-Z and the other curly bracket } indicate a negative number. There are other possibilities in zoned digits, but in this example you are assuming this is the only possibility. You could save the letter A-I and the bracket in a variable and use the variable instead of hard coding in the letters in the expressions. This would be the best practice. The rate may also need to be multiplied by .01 to set the decimal places. In this example it is an integer.
    The date string is broke up into substrings and the hyphens are added then all of that is type cast to a date. You can go into the configure error output and set it to ignore errors for bad dates.  If you set it to ignore errors any bad dates will be null. You could use and check for nulls in another derived columns afterwards and set it to a default date if you need.

wstrInvoiceSign == “D” ? numInvoiceAmount * -1 : numInvoiceAmount
(DT_DBDATE)(SUBSTRING(wstrInvoiceDate,1,4) + “-” + SUBSTRING(wstrInvoiceDate,5,2) + “-” + SUBSTRING(wstrInvoiceDate,7,2))
FINDSTRING(“{ABCDEFGHIJ”,wstrRateChar,1) > 0 ? [intRate] : [intRate] *-1
wstrInvoiceSign == “D” ? numInvoiceAmount * -1 : numInvoiceAmount

image
 
    Here is the final Data viewer and these columns can be mapped to a SQL table. If you want the columns to be VarChar instead of NVarChar then you will need to type cast the columns with the non Unicode type cast in the last derived column (DT_STR).  You might also need to set some of the derived columns to ignore errors so bad dates or bad numbers can be passed through as nulls.
 image
Let me know if you have any EBCDIC to ASCII issues. I love a challenge. If you are an EBCDIC master and you see anything you have wrong here please let me know.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating