My destination (naturally) wanted the other type of string, and SSIS's type strictness rightly prevented a straight-up implicit conversion, resulting in "cannot convert between unicode and non-unicode string data type" errors:
Under the covers, I'm trying to push a non-unicode string column called UserName into a database table that wants a unicode string. What's really occurring in the data flow looks something like this, and it doesn't work:The "By The Book" Solution
I've shown the "automatic" mappings with green arrows, and the things you have to manually set with the blue arrows. In the above, the blue arrows represent the work you have to do to set up the Data Conversion component to change the data type (creating the new column) and the work you have to do to manually change the mapping in the OLE DB Destination.Issues With "The Book"
The prescribed solution above works perfectly... for perfect people... but I'm not one of those. I frequently found myself stumbling over a few problems.
I really don't like the "Copy of" name that was given to the new column. Primarily because it wasn't a copy of that column - it was a derived value. To me, that was like using a Derived Column transformation to create a new column from an existing date column called "Today" with the expression DATEADD("day", 1, [Today]) and calling the new column "Copy of Today". It's not. It's tomorrow. Of course, SSIS has absolutely no way of knowing this semantic distinction. I found myself changing the default converted output column name to things like "original column_str"... but that didn't quite sit right because of the occasional mixture of spaces and underscores. (Yeah, I'm anal that way.)
With a little counseling, I could probably see past the naming issues... except that I would find myself inadvertently using the original column later on in the data flow - because it was named appropriately, and appeared in the column list before the converted column I should have been using. Quite frequently, I found my mistake at the end of the data flow after realizing I'd used the wrong column all the way through the flow.
The end result I wanted was to convert the columns to the desired data type, but keep the column name the same - because it was a column name I (and those that will read my package later) would understand. I also wanted to remove the possibility that the original inappropriately typed column couldn't be used. I also wanted to make it perfectly clear what was happening in my workaround - being clever is fine... but don't outsmart yourself (or the next guy).
Using the Pattern - Pros and Cons
The above workaround is foolproof - I know it is because it saves me from making that mistake every day. I use it every time I do a simple data type conversion, whether it's strings to integers or floats to currency. That's the overwhelming positive for me. However, there are drawbacks to using this pattern - the Union All is an asynchronous component, which ends up throwing a few curves our way.
On the good side, it's not a "blocking" component - it doesn't hold up all the incoming rows until it's seen them all before sending rows out (like the Sort component). This is a good thing, because it doesn't consume a lot of memory.
On the bad side, it causes the Data Flow engine to stop using the buffer (block of memory) it has the data in, and causes a memory copy to be made into a new buffer of a different "shape" (columns and data types). It does this because it assumes that I am joining several flows together, as well as renaming and removing columns from the buffer. Joining several flows together (probably) means that they all have different shapes, and as such, you can't simply pass them out the other side - the output has to have a consistent shape.
The end result (regardless of reasons) is that a memory copy takes place in this pattern where it otherwise shouldn't have to - and that can slow down your data flow in two ways. First, the copy simply takes time. Second, copying means that you're breaking the "execution tree" of the data flow into two parts manually - meaning SSIS's optimizer is stuck with working around that decision, which may result in a less optimal worker thread allocation. (In SSIS 2005, it could actually improve performance in long flows, as 2005 didn't optimize those execution trees as well as 2008 does.)
Even with this pattern, I still have one of my original gripes unresolved, and have actually added some more. I still have a ton of clicks to go through to set this up, especially when I have more than one column to convert. I've just published a component on CodePlex that addresses those issues for me - and I'll blog about that next...