Solving an SSIS Error – Cannot convert between Unicode and non-Unicode

  • Comments posted to this topic are about the item Solving an SSIS Error – Cannot convert between Unicode and non-Unicode

    Thomas LeBlanc, MVP Data Platform Consultant

  • Great article.

    So, do you find this a better option than using a data conversion transformation, or just a different option?

  • If my source is a SQL Server, I generally prefer using CAST/CONVERT for the appropriate data-type conversion. For non-relational sources (e.g. an Excel File), I use the Data Conversion Transform.

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

  • Thanks for comments.

    You can use the Data Conversion component and get the same results. My preference is changing the T-SQL and not use a Data Conversion component, but not for any technical reason.

    What I was really trying to do here in addition to converting the text data, was to show the Advanced Options of a Source Component, Input and Output Properties and the effect of changing the data types.

    This was something I never used or looked into in SSIS until I needed it.

    There are many areas of SSIS I never knew existed until I dug deeper, then realized how powerful of an application SSIS really is.

    Thomas LeBlanc, MVP Data Platform Consultant

  • Thanks Nakul for comments.

    Good point about when to use Data Conversion component.

    Thomas LeBlanc, MVP Data Platform Consultant

  • Thanks for today's lesson.

  • Another cause of this error is someone managed to get a tab character into what is supposed to be a text file. In that case you may need to do some editing and either replace the tab with a space or report it and fail the load. I worked in data warehousing so I am used to bad data. There are some other characters besides tab that will also cause this error.

  • As a training exercise to learn about the Advanced Options, I suppose this is useful. But wouldn't it have been a lot quicker and easier (especially for a much larger project) to simply delete the Data Source Task, and recreate it with the modified T-SQL?

  • While the article doesn't explain the context of the data move, if it's the start of a BI ETL, I think it's better to make the landing table match the source. The first job is to own the data with ideally no changes made to the data. Once you own the data, there are some columns you may want to convert. But there are some columns for which you might want to maintain the nvarchar because of the characters they might contain. There are some designers who make all their columns nvarchar because of what they might possibly contain. This seems to me to be wrong because some columns will only have numbers, and others only basic alphanumerics. But I think your article could have addressed this briefly.

    When I do change it, I also prefer to use the cast in TSQL rather than the data converter. I've done it both ways and at least for me, this seems to be easier to maintain.

  • Non-ascii (control characters) in SQL field.

    Normally, I wouldn't worry about these, however, if you are running a Data Warehouse and use these as dimension attributes, keys, etc. OLAP processing will have a fit!

    Since a couple of my feeds come from a web-based application server, there is always the potential for CR, LF, TABS and other strange non-compliant characters.

    The other place this gets you is in reporting, sometimes you end up with a FF (Form-Feed) character that the printer recognizes and boom, your report now has a blank page or two in the middle of it.

    To reduce this I have created a CLR function that uses REGEX to remove all non printing characters from the string value.

    Doing this does a couple of things, A) remove the unwanted character and B) converts from nchar/nvarchar to char/varchar and finally C) Handles nulls (I hate nulls) esp in OLAP Data Warehousing.

    As to the original post on the SSIS component, I default to the CAST/CONVERT of SQL server to deal with most of this, in getting the output fields to match, I take the source SQL and use a simple SELECT 1 AS A , then I click on the link (between source and destination) then remove unmapped fields, this remove the now incompatible types. I then replace my simple select with my true select and remap the fields.

    This approach works for me as I am in control of both the SOURCE and DESTINATION schema, so the column mapping happens automagically due to the names being identical. This may not be the best approach for all situations.

  • I have a question about this. We have fields with the data type of NVARCHAR because there are special characters in them.

    If you are converting them to VARCHAR, what happens to the special characters? What does the data look like in the output?

  • sql Sarah (12/15/2016)


    I have a question about this. We have fields with the data type of NVARCHAR because there are special characters in them.

    If you are converting them to VARCHAR, what happens to the special characters? What does the data look like in the output?

    SQL Sarah,

    If that is the case, I would have the Destination table columns as nvarchar. This example case in the article does not have special characters for the Source column.

    Thanks for the question!

    Thomas LeBlanc, MVP Data Platform Consultant

  • fsamstag (12/15/2016)


    As a training exercise to learn about the Advanced Options, I suppose this is useful. But wouldn't it have been a lot quicker and easier (especially for a much larger project) to simply delete the Data Source Task, and recreate it with the modified T-SQL?

    fsamstag,

    Yes, that is easier if there are not many columns coming from the source. 🙂

    Thanks for reading and commenting,

    Thomas

    Thomas LeBlanc, MVP Data Platform Consultant

  • Sarah: Depends on the character. If the characters are non-printable the return blanks. Otherwise they map to non-extended code chars.

    I work with overseas (Non-US) data and have to remove some of the accents/extended chars due to the way OLAP is configured to work (US Only). So these are either stripped out by the CLR function or converted to 128 ascii.

    Since we are US based it is not a big deal as all of the business logic is based off of.

    Hope this helps answer you question, but short of making up some examples I don't have a direct answer. Guess I should have started with this sentence.

  • I work with overseas (Non-US) data and have to remove some of the accents/extended chars due to the way OLAP is configured to work (US Only).

    I don't understand this. Why can't nvarchar fields be used even if it is US Only? Some of my fields are nvarchars, and I haven't run into any inherent issues that I've noticed.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply