N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
First, I noticed an issue with my solution where I was getting duplicates, here's an updated version:
declare @table table (IndexID int identity(1,1), Col varchar(100));
insert @table (Col)
values ('173'), ('97'), ('54,115' ), ('1,32,173'), ('21,32'), ('5,32'), ('32,173');
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION
SELECT item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
ORDER BY item; -- not required.
Next, if FN_ListToTable_8K is not an Inline Table Valued function then you need to change that. A multi-line Table Valued splitter function will KILL your performance. The solution you are using includes a recursive CTE - those can be costly too. Here's a performance test to compare techniques:
SET NOCOUNT ON;
-- Sample data:
declare @table table (IndexID int identity(1,1), Col varchar(100));
WITH parts(n,p) AS
(
SELECT TOP (10000)
'1'+REPLICATE('0',(abs(checksum(newid())%3)+1)), abs(checksum(newid())%3)+1
FROM sys.all_columns
)
INSERT @table
SELECT
CASE p
WHEN 1 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))
WHEN 2 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))
ELSE cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))
END
FROM parts;
--SELECT * FROM @table
DBCC TRACEON(2453); -- for better cardinality estimates on temp variables
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'MY SOLUTION: '+char(13)+char(10)+REPLICATE('-',50);
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION
SELECT item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
--ORDER BY item
PRINT 'RCTE:'+char(13)+char(10)+REPLICATE('-',50);
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ItemNumber,
Col = V.Item
from@table Y cross APPLY dbo.DelimitedSplit8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
SELECT DISTINCT Col
FROM R
--ORDER BY col;
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
DBCC TRACEOFF(2453);
Perf Test results:
MY SOLUTION:
--------------------------------------------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 31 ms, elapsed time = 32 ms.
Table '#A73A2F66'. Scan count 25, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 3458, logical reads 98032, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 547 ms, elapsed time = 427 ms.
RCTE:
--------------------------------------------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 16 ms, elapsed time = 21 ms.
Table 'Worktable'. Scan count 58811, logical reads 392249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#A73A2F66'. Scan count 3, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1029 ms, elapsed time = 1193 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
You can see - even with a good splitter the recursive CTE solution is less than half as fast and produces 4X the number of reads.
-- Itzik Ben-Gan 2001