Blog Post

PowerShell and T-SQL: Implementing UNION, INTERSECT, and EXCEPT

,

I’ve been thoroughly enjoying my experience this year with the Scripting Games. This is the first year I've competed and I decided to participate in the Advanced category just to push myself a little harder. I’ve been using Powershell for quite some time, however, I’ve never had the pleasure of using the platform while performing the duty of a Windows administrator. Those days were all about .bat and .vbs files. It’s been a real treat to be able to get my hands dirty with some of the cmdlets and functionality of Powershell that I have yet to use.

On a couple of events, I’ve found good use of some of my T-SQL set based thinking. Most database geeks understand that the database is designed to work in sets. However, PowerShell, similar to its scripting roots, is designed to think iteratively. However, there are times when understanding how different sets of data compare with each other could be a handy concept to understand. I’m going to spend some time on some very cool operators in T-SQL and show how this same concept can be implemented in PowerShell using arrays of objects.

The T-SQL operators UNION, INTERSECT, and EXCEPT and the Compare-Object cmdlet all operate on entire sets of data. We just came out of Easter, so I encourage you to forget about data for a moment – I know, it’s hard for me too – and think of two baskets of jelly beans.

In your left basket, you have seven jelly beans, one for each color of the rainbow (ROY G BIV). On your right side, your over-sugared children have eaten ROY, leaving you with G BIV and a half eaten Peep. In the middle, we have at this moment an empty basket.

Before we start moving jelly beans around, here are some initial set ups for the T-SQL and PowerShell discussions:

T-SQL:

CREATE TABLE LeftBasket
(
ID INT NOT NULL,
Candy VARCHAR(10) NOT NULL
)

CREATE TABLE RightBasket
(
ID INT NOT NULL,
Candy VARCHAR(10) NOT NULL
)

GO
INSERT INTO LeftBasket
(ID, Candy)
VALUES
(1, 'Red'),
(2, 'Orange'),
(3, 'Yellow'),
(4, 'Green'),
(5, 'Blue'),
(6, 'Indigo'),
(7, 'Violet')

INSERT INTO RightBasket
(ID, Candy)
VALUES
(4, 'Green'),
(5, 'Blue'),
(6, 'Indigo'),
(7, 'Violet'),
(8, 'Peep')

PowerShell:

$LeftBasket = ("Red", "Orange", "Yellow", "Green", "Blue", "Indigo", "Violet")
$RightBasket = ("Green", "Blue", "Indigo", "Violet", "Peep")

UNION

The UNION operator is like taking the ROY G BIV from the left basket, the peep from the right basket, and tossing the G BIV from the right basket across the room because you can’t look at any more jelly beans. UNION will give you the distinct results of the combination of the two baskets of jelly beans. In reality, it doesn’t really toss those from the right, but it made sense in the analogy. You end up with 7 jelly beans and a peep in your middle basket.

This can be represented in T-SQL with the following query (based upon the set up mentioned earlier:

SELECT ID, Candy FROM LeftBasket
UNION
SELECT ID, Candy FROM RightBasket
ORDER BY ID

In PowerShell, we can also represent this same concept using Compare-Object

Compare-Object -ReferenceObject $LeftBasket -DifferenceObject $RightBasket ` 
-IncludeEqual

The important point of the above command is the –IncludeEqual switch. This switch modifies the behavior of Compare-Object to not just show differences, but to show you all of the values that are the same as well. The results are identical to the UNION operator in T-SQL.

UNION ALL

UNION ALL has a similar function as UNION, but it doesn’t take the time to determine which to throw out and just dumps all the candy in the middle basket. You end up with 11 jelly beans (ROY GG BB II VV) and a peep in your middle basket.

T-SQL:

SELECT ID, Candy FROM LeftBasket
UNION ALL
SELECT ID, Candy FROM RightBasket
ORDER BY ID

PowerShell:

Write-Output ($LeftBasket,$RightBasket)

You’ll notice that I did not use the Compare-Object for the UNION ALL equivalent. This is because no combination of switches for the Compare-Object will show you duplicates. Since all we really want is just the combination of the two string arrays, we can combine them and replicate the results of UNION ALL.

INTERSECT

If you apply the INTERSECT operator to your baskets of jelly beans, you end up with only four jelly beans (G BIV) in the middle. This is because INTERSECT represents the jelly beans that are in common in both baskets. Anything unique on either side is tossed out.

T-SQL:

SELECT ID, Candy FROM LeftBasket
INTERSECT
SELECT ID, Candy FROM RightBasket
ORDER BY ID

PowerShell:

Compare-Object -ReferenceObject $LeftBasket -DifferenceObject $RightBasket `
-IncludeEqual
-ExcludeDifferent

You’ll notice in the PowerShell command that there are two optional switches. We implement the –ExcludeDifferent and the –IncludeEqual switches to make the output not include differences and include any matches. We get only the matched items of the arrays.

EXCEPT

Finally, EXCEPT. This one will take any jelly beans on the left side that do not exist on the right side and put them in your middle basket. In this case, you would have ROY.

T-SQL:

SELECT ID, Candy FROM LeftBasket
EXCEPT
SELECT ID, Candy FROM RightBasket
ORDER BY ID

Powershell:

Compare-Object -ReferenceObject $LeftBasket -DifferenceObject $RightBasket | 
Where-Object{$_.SideIndicator -eq "<="}

In this PowerShell command, I am adding no switches. Compare-Object is working with default behavior. However, in order to get only those items that exist on the left, we have to implement the Where-Object in the pipeline to get only those that exist on the left and not on the right.

Now, the coolest part about EXCEPT is that if you swap the baskets, you end up with different results. Starting from the right basket, you would end up with nothing more than a peep because all of the jelly beans on the right exist in the set of jelly beans on the left.

T-SQL:

SELECT ID, Candy FROM RightBasket
EXCEPT
SELECT ID, Candy FROM LeftBasket
ORDER BY ID

PowerShell:

Compare-Object -ReferenceObject $RightBasket -DifferenceObject $LeftBasket | 
Where-Object{$_.SideIndicator -eq "<="}

Ok, if you’re not ready to throw up from all the sugar yet, there is a small gotcha with Compare-Object . If you did a Get-Help on this cmdlet, you may have noticed the –SyncWindow parameter. This parameter tells PowerShell how many array objects in either direction to look for similarities. In PowerShell 1.0, this was a default of 5. This meant that if your two arrays were not ordered by the same property and you had the same object stored more than five items away, you could end up with inaccurate results. Fortunately, in PowerShell 2.0, they have modified this default to [Int32]::MaxValue – so we should be covered.

About Kyle Neier
Husband of a magnificent woman, father of 5, SQL Server geek, IndyPASS Vice President and Food Guy, DBA automation zealot, amateur Powershell evangelist. Follow Me on Twitter

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating