Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Using the SSIS Merge Join

The Merge Join Transform in SSIS is a great way to load a Data warehouse quickly and an easy way to join two data sources together. There are a few requirements to join these two data sources. The data sources must be sorted and there must be a key that you can join them with. This can all be done in the data flow of the SSIS package.
Here are some examples of a Merge Join with some different situations so you can see the outputs. The two images below show two tables. The first table contains the first names of people and the second table contains there last names. Each table contains an ID column that will be used in the join.
Table one:
1image
Table two:
2image
These are OLEDB sources. The queries in the sources are calling the data and sorting it with an order by clause. The source does not know the data is sorted so you have to manually tell the source it is sorted. You do this by right clicking on the source and selecting Show Advanced Editor.
3image
Then click on the Input Output Properties Tab. Click on the OLE DB Source Output and Set the IsSorted Property to true.
4image
Then click on the column that is the used to sort the data, in this case the ID column, then set the SortKeyPosition to 1. If you are sorting by more than one column, place a 2 on the next and so on.
5image

Then you connect both sources to the Merge Join. When it asks which input you set the First name table as the left input. You can always swap the inputs later by clicking the swap input button in the Merge Join Transform.
6image

In the Merge Join, Map the Id together and select the columns you want to pass through to the output. In this example the Join type is set to Left Outer Join. The Results will look like the below image.
7image
Notice the Null in ID 5 under last name. There was no match for ID 5, but since it was set to Left Outer Join we still get to keep the ID 5. If you set the Merge Join to Inner Join the ID 5 row would be dropped.
One of the other scenarios you will encounter is when the right side of the join has more than one match for some of the ID columns. For example, let’s add another row to the Last name column.
8image

This new table has an extra row with the ID 1 and the last name Davis 2. If you run this through the Merge Join with the Join set to Left Outer Join the results will match the image below.
9image

If you change the Join to Inner Join the Results will be in the below image. Notice the ID 1 is still on the output twice and ID 5 is still in the output.
10image

Now let’s add another row to the Last name table.
11image

When you run this table through the Merge Join with the Join set to Full Outer Join the results will match the figure below.
12image
Here you can see the Null Id. That is because we passed the ID through from the First name table. The Last name table ID was not passed through. If you want to ensure you get an Id then pass through each Id and use a derived column to check for nulls.
I hope this clarifies how to use the Merge Join Transform. This is a very powerful transform and can make joining data very fast.


Comments

Posted by Kevin O'Connor on 12 September 2011

I have a requirement to make the right hand table flexible with a number of different fields each time the package runs. The Joining field would stay the same.

ie in your example

First run - ID, last

Second run - ID, second, third, fourth

third run - ID, last, second

Is this possible?

Leave a Comment

Please register or log in to leave a comment.