Using UNION ALL before a MERGE JOIN has IsSorted Property Error

  • Has anyone else had problems with this?

    I have 2 OLE DB sources. The first Source goes to a Conditional Split, then one branch goes to a MULTICAST before going to a UNION ALL. The other two branches (Default and the second condition) go straight to the UNION ALL. The UNION then goes straight to the MERGE JOIN (I've also tried it with a MERGE).

    Both sources are set to IsSorted = True and have the same SortKey on the same column (ProducerID). But when I try to edit the MERGE JOIN, it complains that both sources must be set to true on the IsSorted property. I don't understand this because I've checked and double-checked and triple-checked the original sources. They're set to True. And the Conditional Split / Multicast / Union don't have an IsSorted property.

    It seems to me that this sort property is being lost somewhere along the way (around the UNION ALL transformation). Does anyone have any thoughts or solutions to this problem?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • A UNION ALL can (and almost always will) un-sort your data, so it will mark the data as unsorted.

    Data coming from more than one source into a UNION ALL is essentially added into the output workflow as data comes in. Since the data coming in from two sources can be at different rates, even if both sources are sorted by the same keys they may (and are likely to be) out of order when the come out of the UNION ALL.

    As an example, if you have 10 rows in one source and 5 rows in another source, all sorted by an integer column, all 10 rows from the first source may get processed through the UNION ALL before any rows from the second source are. If both sets of data have key values counting up from 1, your final data set will no longer be sorted.

  • Grrr. That's annoying.

    Do you know of anyway to sort the output of a UNION ALL? I can't find one, but that doesn't necessarily mean anything.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • If nothing else - add a sort task after the UNION ALL..... I don't recall a sort option one way or the other within the UNION ALL task....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • The obvious option is to use the SORT component after the UNION ALL.

    If you only have two inputs, you could use a UNION component rather than a UNION ALL - I believe UNION will remain sorted. I suppose if you had three inputs, you could UNION together two of them and then UNION together the third with the output of the first UNION, but that will probably get pretty slow.

    If you do use the UNION or the SORT component, remember that this operation is done in memory by SSIS. So, the entire data stream needs to get into the component and it will then be sorted.

    If you have a lot of data in your data flow, this may be inefficient. Instead you may want to create a staging table with a clustered index in the appropriate sort order (with a reasonable fillfactor). Use a data flow destination to insert into the table and then create a new data flow that selects this data using an ORDER BY clause in the T-SQL (remember that SQL 2005 does not have to return data in the order of the clustered index if you do not specify a sort).

  • Thanks for the ideas, guys. I appreciate the input.

    Now I just have to test it.

    [edit] BTW, where exactly is the UNION component that is not a UNION ALL? I don't see it on my dataflow transformation toolbox.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Sorry - it is the MERGE component (not the MERGE JOIN).

  • MERGE has the same problem after a UNION ALL. It doesn't think the sources are sorted. And the idea behind using a MERGE JOIN was to compare records in an existing table to the records I wanted to insert into the data destination, which MERGE doesn't do.

    So I guess the test is to see if the SORT transformation (Post UNION ALL) fixes that particular issue.

    @sigh. I'm going to start making a list of things MS needs to add to SSIS. As robust and wonderful as it is, it does seem to be missing some things that would make my life easier.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • You misunderstood - use a MERGE rather than the UNION ALL.

    It will union the data sets and leave them sorted. It will, however, only take two inputs.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply