Handle NULLs

Question

Post reply

Handle NULLs

Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

Nevyn SSChampion Points: 14210 More actions · Answer 1

Please someone tell me what I did wrong:

You had a typo in your solution (used hobby1 twice from the first table), but that wouldn't effect the performance.

But doesnt this build a test case with no nulls in it? Not sure if the performance will change at all , but if we're going to measure these solutions there should be nulls to be stripped out.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 2

Nevyn (7/7/2011)
Please someone tell me what I did wrong:
You had a typo in your solution (used hobby1 twice from the first table), but that wouldn't effect the performance.

<edit>I think I found it, but I agree, it probably did not affect performance. That said, there is a better effort later in this post :-)</edit>

But doesnt this build a test case with no nulls in it? Not sure if the performance will change at all , but if we're going to measure these solutions there should be nulls to be stripped out.

A fair point...I am not sure how or if it should measurably affect performance but I changed it up in the code sample below.

Regarding these two requirements mikes84:

1) Do you want duplicates shown in the results?

2) Does order matter?

Using our two tables hobby1 and hobby2 here is an example that should answer both questions:

If...

hobby1.hobby1 = tennis

hobby1.hobby2 = cricket

hobby2.hobby1 is null

hobby2.hobby2 = tennis

...what should the result set look like?

Here is another set of code with more varied test data, including nulls, as well as a Union/Crosstab method, a variation of the Union All/Crosstab, which removes duplicates but does not offer ordering based on column name.

> APPLY method - removes dups, can be ordered by column name

> String manipulation - preserves dups, can be ordered by column name

> Unpivot/Pivot - preserves dups, can be ordered by column name

> Union All/Crosstab - preserves dups, can be ordered by column name

> Union/Crosstab - removes dups, cannot be ordered by column name

> Parsename - preserves dups, can be ordered by column name

SET NOCOUNT ON ;

GO

SET STATISTICS TIME OFF ;

PRINT REPLICATE('-', 75) + '

build temp tables'

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby1')

AND type IN ( N'U' ) )

DROP TABLE hobby1 ;

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby2')

AND type IN ( N'U' ) )

DROP TABLE hobby2 ;

CREATE TABLE hobby1

(

id INT IDENTITY(1, 1),

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

CREATE TABLE hobby2

(

id INT,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

INSERT INTO dbo.hobby1

(

hobby1,

hobby2

)

SELECT TOP ( 250 )

CAST(NEWID() AS VARCHAR(36)),

NULL

FROM master.sys.columns c1

CROSS JOIN master.sys.columns c2

UNION ALL

SELECT TOP ( 250 )

NULL,

CAST(NEWID() AS VARCHAR(36))

FROM master.sys.columns c1

CROSS JOIN master.sys.columns c2

UNION ALL

SELECT TOP ( 250 )

CAST(NEWID() AS VARCHAR(36)),

CAST(NEWID() AS VARCHAR(36))

FROM master.sys.columns c1

CROSS JOIN master.sys.columns c2

UNION ALL

SELECT TOP ( 250 )

NULL,

NULL

FROM master.sys.columns c1

CROSS JOIN master.sys.columns c2 ;

INSERT INTO dbo.hobby2

(

id,

hobby1,

hobby2

)

SELECT id,

NULL,

CAST(NEWID() AS VARCHAR(36))

FROM dbo.hobby1

WHERE id <= 150

UNION ALL

SELECT id,

hobby2,

CAST(NEWID() AS VARCHAR(36))

FROM dbo.hobby1

WHERE id BETWEEN 151 AND 550

UNION ALL

SELECT id,

NULL,

CAST(NEWID() AS VARCHAR(36))

FROM dbo.hobby1

WHERE id BETWEEN 551 AND 950

UNION ALL

SELECT id,

hobby1,

CAST(NEWID() AS VARCHAR(36))

FROM dbo.hobby1

WHERE id BETWEEN 951 AND 1000 ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

APPLY method'

SELECT 'APPLY method'

SET STATISTICS TIME ON ;

-- now transform it by moving all nulls to the end without RBAR

SELECT t.id,

c1toc4 AS newc1,

newc2,

newc3,

newc4

FROM (

SELECT h1.id AS id,

h1.hobby1 AS c1,

h1.hobby2 AS c2,

h2.hobby1 AS c3,

h2.hobby2 AS c4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t -- first CROSS APPLY is just to provide shortcuts for the

-- various COALESCE statements

CROSS APPLY (

SELECT c1toc4 = COALESCE(c1, c2, c3, c4),

c2toc4 = COALESCE(c2, c3, c4),

c3toc4 = COALESCE(c3, c4)

) cx

-- second CROSS APPLY figures out what goes in the second column

-- we know the 1st column is always c1toc4 - i.e. the first non-null

-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4

-- if it matches either or those we need to either take column 4 or

-- null (if col4 matches c1toc4 we have already used it for new column 1)

CROSS APPLY (

SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,

c1toc4), c3toc4), c1toc4), NULLIF(c4, c1toc4))

) cx2

-- and this is for the third column

CROSS APPLY (

SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4, newc2), c1toc4),

NULLIF(NULLIF(c4, newc2), c1toc4))

) cx3

-- and this is for the fourth column

CROSS APPLY (

SELECT newc4 = NULLIF(NULLIF(NULLIF(c4, newc3), newc2), c1toc4)

) cx4 ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

String manipulation'

SELECT 'String manipulation'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)),

RTRIM(SUBSTRING(t.s, 37, 36)),

RTRIM(SUBSTRING(t.s, 73, 36)),

RTRIM(SUBSTRING(t.s, 109, 36))

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

+ SPACE(36- LEN(h2.hobby2)),'') s

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Unpivot/Pivot'

SELECT 'Unpivot/Pivot'

SET STATISTICS TIME ON ;

WITH OneTableCTE

AS (

SELECT a.id,

a.hobby1 AS 'h1',

a.hobby2 AS 'h2',

b.hobby1 AS 'h3',

b.hobby2 AS 'h4'

FROM hobby1 a

INNER JOIN hobby2 b ON a.id = b.id

),

UnPivotCTE

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

AS fixedorder

FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

AS Normalized

)

SELECT OneTableCTE.id,

[1],

[2],

[3],

[4]

FROM (

SELECT id,

hobby,

fixedorder

FROM UnPivotCTE

) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

AS PivotTable

RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union All/Cross tab'

SELECT 'Union All/Cross tab'

SET STATISTICS TIME ON ;

WITH cte(id, hobby, hobby_num)

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union/Cross tab'

SELECT 'Union/Cross tab'

SET STATISTICS TIME ON ;

WITH cte(id, hobby)

AS (

SELECT h.id,

h.hobby1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby1

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby2

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Parsename'

SELECT 'Parsename'

SET STATISTICS TIME ON ;

SELECT h1.id,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

SET STATISTICS TIME OFF ;

GO

Common results on my machine:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
APPLY method
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 260 ms.
---------------------------------------------------------------------------
String manipulation
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 191 ms.
---------------------------------------------------------------------------
Unpivot/Pivot
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 248 ms.
---------------------------------------------------------------------------
Union All/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 127 ms.
---------------------------------------------------------------------------
Union/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 131 ms.
---------------------------------------------------------------------------
Parsename
SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 137 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 3

mister.magoo (7/7/2011)
mikes84 (7/6/2011)
I have a situation where I'm querying two tables, each that has up to two hobbies listed. The first table has hobby1 and hobby2, and the second table has hobby3 and hobby4. It's possible for there to be values in any of the fields. So, it could look like:
hobby1, hobby2, hobby3, hobby4
basketball, tennis, NULL, NULL
or
hobby1, hobby2, hobby3, hobby4
NULL, NULL, basketball, tennis
However, it could also look like:
hobby1, hobby2, hobby3, hobby4
basketball, NULL, tennis, NULL
Ideally, I want it to look like the first example, where the values are brought to the left. I could use COALESCE in the first and second examples and have it work, but it wouldn't work on the third example. Any ideas how I could get something like the third example to look like the first one?
Mike
Hi Orlando (and other contributors),
I don't understand the test data you are using... perhaps it's me but the question was about shuffling values into the leftmost columns where there were nulls between them.

The fact that the APPLY method was removing dups was lost on me initially. I did not break anything, it was working as you had intended per the initial post all along.

The test data you generate has no NULLS and duplicate entries.
As far as I can see, the CROSS APPLY method I provided is the only one that is correctly processing this data into the two leftmost columns and removing duplicate entries (the removal of duplicates may not be a stated requirement, but it suits me to include it and it makes sense as you wouldn't want duplicates anyway)...
Or am I completely missing the point?

I agree with you. I think what you have is a practical solution...why not de-dup? and why not preserve the order of the columns as they are examined per the written query?

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Jeff Moden SSC Guru Points: 1003875 More actions · Answer 4

I decided to build a large amount of test data that more closely matches the OP's given conditions. I'll be back with a possible solution soon. Not to worry about the 100,000 rows in each table... this runs very, very fast. Also note that the column names of the Hobby Names match the OP's.

--=====================================================================================================================

-- Create and populate the test tables. This is NOT a part of the solution.

--=====================================================================================================================

--===== Declare and preset some obviously named local variables

DECLARE @RowsInFirstTable INT,

@RowsInSecondTable INT

;

SELECT @RowsInFirstTable = 100000,

@RowsInSecondTable = 100000

;

--===== Conditionally drop the test tables to make reruns in SSMS easier

IF OBJECT_ID('tempdb..#HobbyName','U') IS NOT NULL DROP TABLE #HobbyName; --Used to build a lot of test data

IF OBJECT_ID('tempdb..#Hobby1','U') IS NOT NULL DROP TABLE #Hobby1;

IF OBJECT_ID('tempdb..#Hobby2','U') IS NOT NULL DROP TABLE #Hobby2;

--===== Create a list of HobbyNames so we can randomly populate the test tables.

-- The NULL's are simply part of the possible things that could be a hobby.

SELECT HobbyNameID = IDENTITY(INT,1,1),

HobbyName = CAST(d.HobbyName AS VARCHAR(15))

INTO #HobbyName

FROM (

SELECT NULL UNION ALL

SELECT 'BaseBall' UNION ALL

SELECT 'Basketball' UNION ALL

SELECT 'Tennis' UNION ALL

SELECT 'FootBall' UNION ALL

SELECT 'Hockey' UNION ALL

SELECT 'Rugby' UNION ALL

SELECT 'Golf' UNION ALL

SELECT 'Bowling'

) d (HobbyName)

;

ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100

;

--===== Create the "first" table and populate it on the fly

SELECT TOP (@RowsInFirstTable)

Hobby1ID = IDENTITY(INT,1,1),

Hobby1 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1),

Hobby2 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1)

INTO #Hobby1

FROM sys.all_columns ac1,

sys.all_columns ac2

;

--===== Create the "second" table and populate it on the fly

SELECT TOP (@RowsInSecondTable)

Hobby2ID = IDENTITY(INT,1,1),

Hobby3 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1),

Hobby4 = (SELECT TOP 1 HobbyName FROM #HobbyName WHERE HobbyNameID = ABS(CHECKSUM(NEWID()))%10+1)

INTO #Hobby2

FROM sys.all_columns ac1,

sys.all_columns ac2

;

--===== Add PK's to both tables

ALTER TABLE #Hobby1 ADD PRIMARY KEY CLUSTERED (Hobby1ID) WITH FILLFACTOR = 100;

ALTER TABLE #Hobby2 ADD PRIMARY KEY CLUSTERED (Hobby2ID) WITH FILLFACTOR = 100;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003875 More actions · Answer 5

For what it's worth, here's my rendition of how this problem might be solved. It doesn't preserve dupes and it sorts the HobbyNames on each row alphabetically. As always, details are in the code. As a sidebar, it only "dips" each table once and there is no join. I also did my own form of unpivot simply because I don't care for the UNPIVOT syntax. 😀 I'll let Orlando test it on his machine rather than make any performance claims. (Orlando, the test data above is set for 100,000 rows... change it {just a couple of variable entries} to whatever you were using for the other's code. Thanks)

--=====================================================================================================================

-- Now, solve the problem. The details are, of course, in the comments.

-- This method only "dips" each table once.

--=====================================================================================================================

WITH

cteEnumerate AS

( --=== This CTE creates an EAV stacked Table removing dupes and NULLs in the process.

-- Then it numbers (enumerates) the item's "column numbers" for reassembly in the outer SELECT coming up.

-- As a pleasent side effect, the HobbyNames are sorted in alphabetical order.

-- This step is a form of "pre-aggregation"

SELECT ColNum = ROW_NUMBER() OVER (PARTITION BY eav.HobbyID ORDER BY eav.HobbyName),

eav.HobbyID,

eav.HobbyName

FROM ( --=== This does the EAV "stacking", removal of dupes, removal of NULLs as combined info from both tables

SELECT --===== Unpivot the "first" hobby table removing nulls and dupes

DISTINCT

HobbyID = h1.Hobby1ID,

HobbyName = upvt.HobbyName

FROM #Hobby1 h1

CROSS APPLY ( --=== This does a high speed "Unpivot" of the first table

SELECT Hobby1 UNION ALL

SELECT Hobby2

) upvt (HobbyName)

WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK

UNION -- We use union to get rid of dupes between tables ----------------------------------------------------

SELECT --===== Unpivot the "second" hobby table removing nulls and dupes

DISTINCT

HobbyID = h2.Hobby2ID,

HobbyName = upvt.HobbyName

FROM #Hobby2 h2

CROSS APPLY ( --=== This does a high speed "Unpivot" of the second table

SELECT Hobby3 UNION ALL

SELECT Hobby4

) upvt (HobbyName)

WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK

) eav

) --=== Now, we reassemble the data using a high-speed cross tab.

-- For convenience of anticipated queries, we also return the number of hobbies for each "HobbyID"

SELECT HobbyID,

Hobby1 = MAX(CASE WHEN ColNum = 1 THEN HobbyName ELSE '' END),

Hobby2 = MAX(CASE WHEN ColNum = 2 THEN HobbyName ELSE '' END),

Hobby3 = MAX(CASE WHEN ColNum = 3 THEN HobbyName ELSE '' END),

Hobby4 = MAX(CASE WHEN ColNum = 4 THEN HobbyName ELSE '' END),

HobbyCount = COUNT(*)

FROM cteEnumerate

GROUP BY HobbyID

;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 6

I have some re-thinking to do and some more learning to do from your work Jeff...I had suspicions before but it is abundantly clear to me now just how bad my approach is to conjuring up test data :laugh:

Jeff, I modded your test data build script so we could run the existing solutions against the concrete tables we were using before. It just seemed easier than modding all the solution scripts but that means there is a risk that I munged your code in the process but I think I did OK.

> APPLY method - removes dups, can be ordered by column name

> String manipulation - preserves dups, can be ordered by column name

> Unpivot/Pivot - preserves dups, can be ordered by column name

> Union All/Crosstab - preserves dups, can be ordered by column name, does not return rows where all hobbies are null

> Union/Crosstab - removes dups, cannot be ordered by column name, does not return rows where all hobbies are null

> Parsename - preserves dups, can be ordered by column name

> Jeff's Rendition - removes dups, cannot be ordered by column name, does not return rows where all hobbies are null

Here is the complete set of tests:

SET NOCOUNT ON ;

GO

SET STATISTICS TIME OFF ;

GO

PRINT REPLICATE('-', 75) + '

build temp tables'

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby1')

AND type IN ( N'U' ) )

DROP TABLE hobby1 ;

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby2')

AND type IN ( N'U' ) )

DROP TABLE hobby2 ;

CREATE TABLE hobby1

(

id INT IDENTITY(1, 1),

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

CREATE TABLE hobby2

(

id INT IDENTITY(1, 1),

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

--===== Add PK's to both tables

ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

GO

--=====================================================================================================================

-- Create and populate the test tables. This is NOT a part of the solution.

--=====================================================================================================================

--===== Declare and preset some obviously named local variables

DECLARE @RowsInFirstTable INT,

@RowsInSecondTable INT ;

SELECT @RowsInFirstTable = 100000,

@RowsInSecondTable = 100000 ;

IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

DROP TABLE #HobbyName;

--===== Create a list of HobbyNames so we can randomly populate the test tables.

-- The NULL's are simply part of the possible things that could be a hobby.

SELECT HobbyNameID = IDENTITY( INT,1,1),

HobbyName = CAST(d.HobbyName AS VARCHAR(15))

INTO #HobbyName

FROM (

SELECT NULL

UNION ALL

SELECT NULL

UNION ALL

SELECT 'BaseBall'

UNION ALL

SELECT 'Basketball'

UNION ALL

SELECT 'Tennis'

UNION ALL

SELECT 'FootBall'

UNION ALL

SELECT 'Hockey'

UNION ALL

SELECT 'Rugby'

UNION ALL

SELECT 'Golf'

UNION ALL

SELECT 'Bowling'

) d ( HobbyName ) ;

ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

--===== Create the "first" table and populate it on the fly

INSERT INTO dbo.hobby1

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInFirstTable )

Hobby1 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

Hobby2 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

--===== Create the "second" table and populate it on the fly

INSERT INTO dbo.hobby2

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInSecondTable )

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

APPLY method'

SELECT 'APPLY method'

SET STATISTICS TIME ON ;

-- now transform it by moving all nulls to the end without RBAR

SELECT t.id,

c1toc4 AS newc1,

newc2,

newc3,

newc4

FROM (

SELECT h1.id AS id,

h1.hobby1 AS c1,

h1.hobby2 AS c2,

h2.hobby1 AS c3,

h2.hobby2 AS c4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t -- first CROSS APPLY is just to provide shortcuts for the

-- various COALESCE statements

CROSS APPLY (

SELECT c1toc4 = COALESCE(c1, c2, c3, c4),

c2toc4 = COALESCE(c2, c3, c4),

c3toc4 = COALESCE(c3, c4)

) cx

-- second CROSS APPLY figures out what goes in the second column

-- we know the 1st column is always c1toc4 - i.e. the first non-null

-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4

-- if it matches either or those we need to either take column 4 or

-- null (if col4 matches c1toc4 we have already used it for new column 1)

CROSS APPLY (

SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,

c1toc4), c3toc4), c1toc4), NULLIF(c4, c1toc4))

) cx2

-- and this is for the third column

CROSS APPLY (

SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4, newc2), c1toc4),

NULLIF(NULLIF(c4, newc2), c1toc4))

) cx3

-- and this is for the fourth column

CROSS APPLY (

SELECT newc4 = NULLIF(NULLIF(NULLIF(c4, newc3), newc2), c1toc4)

) cx4 ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

String manipulation'

SELECT 'String manipulation'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)),

RTRIM(SUBSTRING(t.s, 37, 36)),

RTRIM(SUBSTRING(t.s, 73, 36)),

RTRIM(SUBSTRING(t.s, 109, 36))

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

+ SPACE(36- LEN(h2.hobby2)),'') s

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Unpivot/Pivot'

SELECT 'Unpivot/Pivot'

SET STATISTICS TIME ON ;

WITH OneTableCTE

AS (

SELECT a.id,

a.hobby1 AS 'h1',

a.hobby2 AS 'h2',

b.hobby1 AS 'h3',

b.hobby2 AS 'h4'

FROM hobby1 a

INNER JOIN hobby2 b ON a.id = b.id

),

UnPivotCTE

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

AS fixedorder

FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

AS Normalized

)

SELECT OneTableCTE.id,

[1],

[2],

[3],

[4]

FROM (

SELECT id,

hobby,

fixedorder

FROM UnPivotCTE

) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

AS PivotTable

RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union All/Cross tab'

SELECT 'Union All/Cross tab'

SET STATISTICS TIME ON ;

WITH cte(id, hobby, hobby_num)

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union/Cross tab'

SELECT 'Union/Cross tab'

SET STATISTICS TIME ON ;

WITH cte(id, hobby)

AS (

SELECT h.id,

h.hobby1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby1

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby2

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Parsename'

SELECT 'Parsename'

SET STATISTICS TIME ON ;

SELECT h1.id,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Jeff''s Rendition'

SELECT 'Jeff''s Rendition'

SET STATISTICS TIME ON ;

--=====================================================================================================================

-- Now, solve the problem. The details are, of course, in the comments.

-- This method only "dips" each table once.

--=====================================================================================================================

-- Now, solve the problem. The details are, of course, in the comments.

-- This method only "dips" each table once.

--=====================================================================================================================

WITH

cteEnumerate AS

( --=== This CTE creates an EAV stacked Table removing dupes and NULLs in the process.

-- Then it numbers (enumerates) the item's "column numbers" for reassembly in the outer SELECT coming up.

-- As a pleasent side effect, the HobbyNames are sorted in alphabetical order.

-- This step is a form of "pre-aggregation"

SELECT ColNum = ROW_NUMBER() OVER (PARTITION BY eav.HobbyID ORDER BY eav.HobbyName),

eav.HobbyID,

eav.HobbyName

FROM ( --=== This does the EAV "stacking", removal of dupes, removal of NULLs as combined info from both tables

SELECT --===== Unpivot the "first" hobby table removing nulls and dupes

DISTINCT

HobbyID = h1.id,

HobbyName = upvt.HobbyName

FROM dbo.hobby1 h1

CROSS APPLY ( --=== This does a high speed "Unpivot" of the first table

SELECT hobby1 UNION ALL

SELECT hobby2

) upvt (HobbyName)

WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK

UNION -- We use union to get rid of dupes between tables ----------------------------------------------------

SELECT --===== Unpivot the "second" hobby table removing nulls and dupes

DISTINCT

HobbyID = h2.id,

HobbyName = upvt.HobbyName

FROM dbo.hobby2 h2

CROSS APPLY ( --=== This does a high speed "Unpivot" of the second table

SELECT hobby1 UNION ALL

SELECT hobby2

) upvt (HobbyName)

WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK

) eav

) --=== Now, we reassemble the data using a high-speed cross tab.

-- For convenience of anticipated queries, we also return the number of hobbies for each "HobbyID"

SELECT HobbyID,

Hobby1 = MAX(CASE WHEN ColNum = 1 THEN HobbyName ELSE '' END),

Hobby2 = MAX(CASE WHEN ColNum = 2 THEN HobbyName ELSE '' END),

Hobby3 = MAX(CASE WHEN ColNum = 3 THEN HobbyName ELSE '' END),

Hobby4 = MAX(CASE WHEN ColNum = 4 THEN HobbyName ELSE '' END),

HobbyCount = COUNT(*)

FROM cteEnumerate

GROUP BY HobbyID

;

SET STATISTICS TIME OFF ;

GO

And here is a typical set of results on my machine after checking about a dozen or so runs:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
APPLY method
SQL Server Execution Times:
CPU time = 17799 ms, elapsed time = 18257 ms.
---------------------------------------------------------------------------
String manipulation
SQL Server Execution Times:
CPU time = 921 ms, elapsed time = 2009 ms.
---------------------------------------------------------------------------
Unpivot/Pivot
SQL Server Execution Times:
CPU time = 1546 ms, elapsed time = 2138 ms.
---------------------------------------------------------------------------
Union All/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 796 ms, elapsed time = 1804 ms.
---------------------------------------------------------------------------
Union/Cross tab
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 780 ms, elapsed time = 1967 ms.
---------------------------------------------------------------------------
Parsename
SQL Server Execution Times:
CPU time = 1466 ms, elapsed time = 1968 ms.
---------------------------------------------------------------------------
Jeff's Rendition
SQL Server Execution Times:
CPU time = 1435 ms, elapsed time = 2181 ms.

Edit 7/8/2011 7:20 Mtn: corrected property of Jeff's rendition, it "cannot be ordered by column name"

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

mister.magoo SSC-Forever Points: 47068 More actions · Answer 7

Wow! that shows how slow my isnull/nullif/apply method is!

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

tommyh SSCertifiable Points: 6302 More actions · Answer 8

A question which i hope someone might be able to answer about the "Union/Cross tab" solution.

Okay so lets say we have a result looking like 1, "Tennis", Null, "Golf", "Bowling".

This first CTE will convert this to

1 "Tennis"

1 null

1 "Golf"

1 "Bowling"

The second CTE adds a rownumber to each row order by id. So it should create

1, 1, "Tennis"

1, 2, null

1, 3, Golf"

1, 4, "Bowling"

But how can we be sure it actually does that? There is no order by that forces it to come in that order. Is there a feature to CTEs so that they always come in a certain way without the order by?

Have worked like 30min with CTEs (still stuck with 2000) so all very new 🙂

/T

Mike Scalise SSCrazy Eights Points: 8145 More actions · Answer 9

Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?

1) I don't want duplicates in the results

2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2

If any hobbies are null, the next non-null hobby should be in its place.

In the case of the sample you provided, it would be

tennis, cricket, tennis, null

I can't remember if your XML-based solution met both of these requirements.

Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?

Mike Scalise, PMP
https://www.michaelscalise.com

tommyh SSCertifiable Points: 6302 More actions · Answer 10

Modding my code so it should do what you now want.

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)),

RTRIM(SUBSTRING(t.s, 37, 36)),

RTRIM(SUBSTRING(t.s, 73, 36)),

RTRIM(SUBSTRING(t.s, 109, 36))

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') end +

case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 + SPACE(36 - LEN(h2.hobby1)),'') end +

case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

Think it works atleast 😀

/T

Nevyn SSChampion Points: 14210 More actions · Answer 11

I think you misunderstood the question, Mike. They are asking if the same hobby (Tennis in the example) appears twice in the four possible hobby records whether you want one of them eliminated.

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 12

mikes84 (7/8/2011)
Regarding these two requirements mikes84:
1) Do you want duplicates shown in the results?
2) Does order matter?
Using our two tables hobby1 and hobby2 here is an example that should answer both questions:
If...
hobby1.hobby1 = tennis
hobby1.hobby2 = cricket
hobby2.hobby1 is null
hobby2.hobby2 = tennis
...what should the result set look like?
1) I don't want duplicates in the results
2) Order does matter. The order is: hobby1.hobby1, hobby1.hobby2, hobby2.hobby1, hobby2.hobby2
If any hobbies are null, the next non-null hobby should be in its place.
In the case of the sample you provided, it would be
tennis, cricket, tennis, null
I can't remember if your XML-based solution met both of these requirements.
Jeff, I think your solution addresses requirement #1, but not #2, since it's ordering by hobby name alphabetically? Do you know how it could be modified to order in the aforementioned way?

From your description I would think you wanted:

tennis, cricket, null, null

Please confirm.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 13

tommyh (7/8/2011)
A question which i hope someone might be able to answer about the "Union/Cross tab" solution.
Okay so lets say we have a result looking like 1, "Tennis", Null, "Golf", "Bowling".
This first CTE will convert this to
1 "Tennis"
1 null
1 "Golf"
1 "Bowling"
The second CTE adds a rownumber to each row order by id. So it should create
1, 1, "Tennis"
1, 2, null
1, 3, Golf"
1, 4, "Bowling"
But how can we be sure it actually does that? There is no order by that forces it to come in that order. Is there a feature to CTEs so that they always come in a certain way without the order by?

We can't 😀 You cannot guarantee the order inside a derived table or CTE <edit>unless you use TOP but that does not apply in this problem case</edit>. Only the outermost query can ask for and be guaranteed ordering. That's why the "Union/Crosstab" method has the property "cannot be ordered by column name". The "Union All/Crosstab" assigns a value to each row based on the column it came from and it uses that for later pivoting so it can guarantee column ordering.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Nevyn SSChampion Points: 14210 More actions · Answer 14

Jeff Moden (7/7/2011)
For what it's worth, here's my rendition of how this problem might be solved. It doesn't preserve dupes and it sorts the HobbyNames on each row alphabetically. As always, details are in the code. As a sidebar, it only "dips" each table once and there is no join. I also did my own form of unpivot simply because I don't care for the UNPIVOT syntax. 😀 I'll let Orlando test it on his machine rather than make any performance claims. (Orlando, the test data above is set for 100,000 rows... change it {just a couple of variable entries} to whatever you were using for the other's code. Thanks)

Cool stuff, Jeff!

So do you ever use PIVOT and/or UNPIVOT, or are workarounds almost always better/faster?

Orlando Colamatteo SSC Guru Points: 182276 More actions · Answer 15

Here is the code including the latest string manipulation approach from tommyh and a variation of the "Union All/Crosstab" approach I put together, both of which remove dups yet still allow for a column order:

SET STATISTICS TIME OFF ;

GO

SET NOCOUNT ON ;

GO

PRINT REPLICATE('-', 75) + '

build temp tables'

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby1')

AND type IN ( N'U' ) )

DROP TABLE hobby1 ;

IF EXISTS ( SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'hobby2')

AND type IN ( N'U' ) )

DROP TABLE hobby2 ;

CREATE TABLE hobby1

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

CREATE TABLE hobby2

(

id INT IDENTITY(1, 1) NOT NULL,

hobby1 VARCHAR(36),

hobby2 VARCHAR(36)

) ;

GO

--===== Add PK's to both tables

ALTER TABLE dbo.hobby1 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

ALTER TABLE dbo.hobby2 ADD PRIMARY KEY CLUSTERED (id) WITH FILLFACTOR = 100 ;

GO

--============================================================================

-- Create and populate the test tables. This is NOT a part of the solution.

--============================================================================

--===== Declare and preset some obviously named local variables

DECLARE @RowsInFirstTable INT,

@RowsInSecondTable INT ;

SELECT @RowsInFirstTable = 200000,

@RowsInSecondTable = 200000 ;

IF OBJECT_ID(N'tempdb..#HobbyName') IS NOT NULL

DROP TABLE #HobbyName;

--===== Create a list of HobbyNames so we can randomly populate the test tables.

-- The NULL's are simply part of the possible things that could be a hobby.

SELECT HobbyNameID = IDENTITY( INT,1,1),

HobbyName = CAST(d.HobbyName AS VARCHAR(15))

INTO #HobbyName

FROM (

SELECT NULL

UNION ALL

SELECT NULL

UNION ALL

SELECT 'BaseBall'

UNION ALL

SELECT 'Basketball'

UNION ALL

SELECT 'Tennis'

UNION ALL

SELECT 'FootBall'

UNION ALL

SELECT 'Hockey'

UNION ALL

SELECT 'Rugby'

UNION ALL

SELECT 'Golf'

UNION ALL

SELECT 'Bowling'

) d ( HobbyName ) ;

ALTER TABLE #HobbyName ADD PRIMARY KEY CLUSTERED (HobbyNameID) WITH FILLFACTOR = 100 ;

--===== Create the "first" table and populate it on the fly

INSERT INTO dbo.hobby1

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInFirstTable )

Hobby1 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

Hobby2 = (

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

--===== Create the "second" table and populate it on the fly

INSERT INTO dbo.hobby2

(

hobby1,

hobby2

)

SELECT TOP ( @RowsInSecondTable )

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

),

(

SELECT TOP 1

HobbyName

FROM #HobbyName

WHERE HobbyNameID = ABS(CHECKSUM(NEWID())) % 10 + 1

)

FROM sys.all_columns ac1,

sys.all_columns ac2 ;

GO

------------------------------------------------------------------------------------

/*

PRINT REPLICATE('-', 75) + '

APPLY method (no dups, ordered by column)'

SELECT 'APPLY method (no dups, ordered by column)'

SET STATISTICS TIME ON ;

-- now transform it by moving all nulls to the end without RBAR

SELECT t.id,

c1toc4 AS newc1,

newc2,

newc3,

newc4

FROM (

SELECT h1.id AS id,

h1.hobby1 AS c1,

h1.hobby2 AS c2,

h2.hobby1 AS c3,

h2.hobby2 AS c4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t -- first CROSS APPLY is just to provide shortcuts for the

-- various COALESCE statements

CROSS APPLY (

SELECT c1toc4 = COALESCE(c1, c2, c3, c4),

c2toc4 = COALESCE(c2, c3, c4),

c3toc4 = COALESCE(c3, c4)

) cx

-- second CROSS APPLY figures out what goes in the second column

-- we know the 1st column is always c1toc4 - i.e. the first non-null

-- so here we compare the first non-null in cols 2-4 with c1toc4 and c3toc4

-- if it matches either or those we need to either take column 4 or

-- null (if col4 matches c1toc4 we have already used it for new column 1)

CROSS APPLY (

SELECT newc2 = COALESCE(NULLIF(COALESCE(NULLIF(c2toc4,

c1toc4), c3toc4), c1toc4), NULLIF(c4, c1toc4))

) cx2

-- and this is for the third column

CROSS APPLY (

SELECT newc3 = COALESCE(NULLIF(NULLIF(c3toc4, newc2), c1toc4),

NULLIF(NULLIF(c4, newc2), c1toc4))

) cx3

-- and this is for the fourth column

CROSS APPLY (

SELECT newc4 = NULLIF(NULLIF(NULLIF(c4, newc3), newc2), c1toc4)

) cx4 ;

SET STATISTICS TIME OFF ;

GO

*/

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

String manipulation (dups, ordered by column)'

SELECT 'String manipulation (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)),

RTRIM(SUBSTRING(t.s, 37, 36)),

RTRIM(SUBSTRING(t.s, 73, 36)),

RTRIM(SUBSTRING(t.s, 109, 36))

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

ISNULL(h1.hobby2 + SPACE(36 - LEN(h1.hobby2)), '') + ISNULL(h2.hobby1 + SPACE(36

- LEN(h2.hobby1)),'') + ISNULL(h2.hobby2

+ SPACE(36- LEN(h2.hobby2)),'') s

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

String manipulation (no dups, ordered by column)'

SELECT 'String manipulation (no dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT t.id,

RTRIM(SUBSTRING(t.s, 1, 36)),

RTRIM(SUBSTRING(t.s, 37, 36)),

RTRIM(SUBSTRING(t.s, 73, 36)),

RTRIM(SUBSTRING(t.s, 109, 36))

FROM (

SELECT h1.id,

ISNULL(h1.hobby1 + SPACE(36 - LEN(h1.hobby1)), '') +

case when h1.hobby2 in (h1.hobby1) then '' else ISNULL(h1.hobby2 +

SPACE(36 - LEN(h1.hobby2)), '') end +

case when h2.hobby1 in (h1.hobby1, h1.hobby2) then '' else ISNULL(h2.hobby1 +

SPACE(36 - LEN(h2.hobby1)),'') end +

case when h2.hobby2 in (h1.hobby1, h1.hobby2, h2.hobby1) then '' else

ISNULL(h2.hobby2 + SPACE(36 - LEN(h2.hobby2)),'') end s

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id

) t ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Unpivot/Pivot (dups, ordered by column)'

SELECT 'Unpivot/Pivot (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH OneTableCTE

AS (

SELECT a.id,

a.hobby1 AS 'h1',

a.hobby2 AS 'h2',

b.hobby1 AS 'h3',

b.hobby2 AS 'h4'

FROM hobby1 a

INNER JOIN hobby2 b ON a.id = b.id

),

UnPivotCTE

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobbyorder )

AS fixedorder

FROM OneTableCTE UNPIVOT ( hobby FOR hobbyorder IN ( h1, h2, h3, h4 ) )

AS Normalized

)

SELECT OneTableCTE.id,

[1],

[2],

[3],

[4]

FROM (

SELECT id,

hobby,

fixedorder

FROM UnPivotCTE

) AS Undone PIVOT ( MIN(hobby) FOR fixedorder IN ( [1], [2], [3], [4] ) )

AS PivotTable

RIGHT OUTER JOIN OneTableCTE ON PivotTable.id = OneTableCTE.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union All/Cross tab (dups, ordered by column)'

SELECT 'Union All/Cross tab (dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH cte(id, hobby, hobby_num)

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY hobby_num)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union All/Cross tab (no dups, ordered by column)'

SELECT 'Union All/Cross tab (no dups, ordered by column)'

SET STATISTICS TIME ON ;

WITH cte ( id, hobby, hobby_num )

AS (

SELECT h.id,

h.hobby1,

1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

AND h.hobby1 IS NOT NULL

UNION ALL

SELECT h.id,

h.hobby2,

2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

AND h.hobby2 IS NOT NULL

UNION ALL

SELECT h.id,

h.hobby1,

3

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

AND h.hobby1 IS NOT NULL

UNION ALL

SELECT h.id,

h.hobby2,

4

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

AND h.hobby2 IS NOT NULL

),

cte2 ( id, hobby, hobby_num, hobby_partition_num )

AS (

-- partition so we can omit dups later

SELECT id,

hobby,

hobby_num,

ROW_NUMBER() OVER ( PARTITION BY id, hobby ORDER BY hobby, hobby_num )

FROM cte

),

cte3 ( id, hobby, row_num )

AS (

-- remove dups

SELECT id,

hobby,

ROW_NUMBER() OVER ( PARTITION BY id ORDER BY hobby_num )

FROM cte2

WHERE hobby_partition_num = 1

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte3

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Union/Cross tab (no dups, not ordered by column)'

SELECT 'Union/Cross tab (no dups, not ordered by column)'

SET STATISTICS TIME ON ;

WITH cte(id, hobby)

AS (

SELECT h.id,

h.hobby1

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby2

FROM dbo.hobby1 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby2

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby1

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

UNION

SELECT h.id,

h.hobby2

FROM dbo.hobby2 h

WHERE EXISTS ( SELECT *

FROM dbo.hobby1

WHERE id = h.id )

),

cte2(id, hobby, row_num)

AS (

SELECT id,

hobby,

ROW_NUMBER() OVER (PARTITION BY id ORDER BY id)

FROM cte

WHERE hobby IS NOT NULL

)

SELECT id,

MAX(CASE WHEN row_num = 1 THEN hobby

END) AS hobby1,

MAX(CASE WHEN row_num = 2 THEN hobby

END) AS hobby2,

MAX(CASE WHEN row_num = 3 THEN hobby

END) AS hobby3,

MAX(CASE WHEN row_num = 4 THEN hobby

END) AS hobby4

FROM cte2

GROUP BY id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Parsename (dups, ordered by column)'

SELECT 'Parsename (dups, ordered by column)'

SET STATISTICS TIME ON ;

SELECT h1.id,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 1)) AS Hobby1,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 2)) AS Hobby2,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 3)) AS Hobby3,

REVERSE(PARSENAME(REVERSE(REPLACE(RTRIM(

CASE WHEN h1.hobby1 IS NULL THEN '' ELSE h1.hobby1 + ' ' END +

CASE WHEN h1.hobby2 IS NULL THEN '' ELSE h1.hobby2 + ' ' END +

CASE WHEN h2.hobby1 IS NULL THEN '' ELSE h2.hobby1 + ' ' END +

CASE WHEN h2.hobby2 IS NULL THEN '' ELSE h2.hobby2 END), ' ', '.')), 4)) AS Hobby4

FROM dbo.hobby1 h1

JOIN dbo.hobby2 h2 ON h1.id = h2.id ;

SET STATISTICS TIME OFF ;

GO

------------------------------------------------------------------------------------

PRINT REPLICATE('-', 75) + '

Jeff''s Rendition (no dups, not ordered by column)'

SELECT 'Jeff''s Rendition (no dups, not ordered by column)'

SET STATISTICS TIME ON ;

--======================================================================

-- Now, solve the problem. The details are, of course, in the comments.

-- This method only "dips" each table once.

--======================================================================

-- Now, solve the problem. The details are, of course, in the comments.

-- This method only "dips" each table once.

--======================================================================

WITH

cteEnumerate AS

( --=== This CTE creates an EAV stacked Table removing dupes and NULLs in the process.

-- Then it numbers (enumerates) the item's "column numbers" for reassembly in the

-- outer SELECT coming up.

-- As a pleasent side effect, the HobbyNames are sorted in alphabetical order.

-- This step is a form of "pre-aggregation"

SELECT ColNum = ROW_NUMBER() OVER (PARTITION BY eav.HobbyID ORDER BY eav.HobbyName),

eav.HobbyID,

eav.HobbyName

FROM (

--=== This does the EAV "stacking", removal of dupes, removal of NULLs as combined info

-- from both tables

SELECT --===== Unpivot the "first" hobby table removing nulls and dupes

DISTINCT

HobbyID = h1.id,

HobbyName = upvt.HobbyName

FROM dbo.hobby1 h1

CROSS APPLY ( --=== This does a high speed "Unpivot" of the first table

SELECT hobby1 UNION ALL

SELECT hobby2

) upvt (HobbyName)

WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK

UNION -- We use union to get rid of dupes between tables -----------------------------

SELECT --===== Unpivot the "second" hobby table removing nulls and dupes

DISTINCT

HobbyID = h2.id,

HobbyName = upvt.HobbyName

FROM dbo.hobby2 h2

CROSS APPLY ( --=== This does a high speed "Unpivot" of the second table

SELECT hobby1 UNION ALL

SELECT hobby2

) upvt (HobbyName)

WHERE upvt.HobbyName > '' --Is NOT NULL and is NOT BLANK

) eav

) --=== Now, we reassemble the data using a high-speed cross tab.

-- For convenience of anticipated queries, we also return the number of hobbies for each "HobbyID"

SELECT HobbyID,

Hobby1 = MAX(CASE WHEN ColNum = 1 THEN HobbyName ELSE '' END),

Hobby2 = MAX(CASE WHEN ColNum = 2 THEN HobbyName ELSE '' END),

Hobby3 = MAX(CASE WHEN ColNum = 3 THEN HobbyName ELSE '' END),

Hobby4 = MAX(CASE WHEN ColNum = 4 THEN HobbyName ELSE '' END),

HobbyCount = COUNT(*)

FROM cteEnumerate

GROUP BY HobbyID

;

SET STATISTICS TIME OFF ;

GO

Typical results when processing 100,000 rows in each table:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 891 ms, elapsed time = 1875 ms.
---------------------------------------------------------------------------
String manipulation (no dups, ordered by column)
SQL Server Execution Times:
CPU time = 1031 ms, elapsed time = 1506 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1218 ms, elapsed time = 1641 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 625 ms, elapsed time = 1618 ms.
---------------------------------------------------------------------------
Union All/Cross tab (no dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1000 ms, elapsed time = 2384 ms.
---------------------------------------------------------------------------
Union/Cross tab (no dups, not ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 610 ms, elapsed time = 1582 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1062 ms, elapsed time = 1513 ms.
---------------------------------------------------------------------------
Jeff's Rendition (no dups, not ordered by column)
SQL Server Execution Times:
CPU time = 1156 ms, elapsed time = 1785 ms.

Typical results when processing 200,000 rows in each table:

---------------------------------------------------------------------------
build temp tables
---------------------------------------------------------------------------
String manipulation (dups, ordered by column)
SQL Server Execution Times:
CPU time = 1485 ms, elapsed time = 2860 ms.
---------------------------------------------------------------------------
String manipulation (no dups, ordered by column)
SQL Server Execution Times:
CPU time = 2125 ms, elapsed time = 3133 ms.
---------------------------------------------------------------------------
Unpivot/Pivot (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2157 ms, elapsed time = 3166 ms.
---------------------------------------------------------------------------
Union All/Cross tab (dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1281 ms, elapsed time = 2864 ms.
---------------------------------------------------------------------------
Union All/Cross tab (no dups, ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 2000 ms, elapsed time = 4827 ms.
---------------------------------------------------------------------------
Union/Cross tab (no dups, not ordered by column)
Warning: Null value is eliminated by an aggregate or other SET operation.
SQL Server Execution Times:
CPU time = 1250 ms, elapsed time = 2863 ms.
---------------------------------------------------------------------------
Parsename (dups, ordered by column)
SQL Server Execution Times:
CPU time = 2141 ms, elapsed time = 2931 ms.
---------------------------------------------------------------------------
Jeff's Rendition (no dups, not ordered by column)
SQL Server Execution Times:
CPU time = 2297 ms, elapsed time = 3598 ms.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato