November 29, 2016 at 2:36 pm
I have a table that contains one column and is populated as follows:
declare @table table ( Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
I'm in need of a query that will split every row that has multiple values into all combinations, and return a distinct list of combinations. For the example above, I need to return:
Each individual value (unique):
----------------------------------
'1'
'5'
'21'
'32'
'97'
'115'
'173'
Groupings of two values:
---------------------------
'1,32'
'1,173'
'32,173'
'5,32'
'21,32'
'54,115'
Groupings of three values:
-----------------------------
'1,32,173'
In summary, for every row, all possible combinations; that is, for row '1,32,173', I need: '1', '32', '173', '1,32', '1,173', '32,173' and '1,32,173'. Any help is greatly appreciated.
November 29, 2016 at 5:06 pm
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):
-- your sample data
declare @table table ( Col varchar(100) )
insert @table (Col)
values ('173'), ('97'), ('54,115' ), ('1,32,173'), ('21,32'), ('5,32'), ('32,173');
-- solution:
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT a.col, item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col, col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col, col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col, item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
ORDER BY col -- not required, will slow you down. Included for demo purposes only!
You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
-- Itzik Ben-Gan 2001
November 30, 2016 at 2:42 am
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
November 30, 2016 at 2:47 am
N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
Now in a way that one can actually read the code :w00t:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
selectdistinct Col from R order by Col
November 30, 2016 at 9:54 am
N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
First, I noticed an issue with my solution where I was getting duplicates, here's an updated version:
declare @table table (IndexID int identity(1,1), Col varchar(100));
insert @table (Col)
values ('173'), ('97'), ('54,115' ), ('1,32,173'), ('21,32'), ('5,32'), ('32,173');
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION
SELECT item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
ORDER BY item; -- not required.
Next, if FN_ListToTable_8K is not an Inline Table Valued function then you need to change that. A multi-line Table Valued splitter function will KILL your performance. The solution you are using includes a recursive CTE - those can be costly too. Here's a performance test to compare techniques:
SET NOCOUNT ON;
-- Sample data:
declare @table table (IndexID int identity(1,1), Col varchar(100));
WITH parts(n,p) AS
(
SELECT TOP (10000)
'1'+REPLICATE('0',(abs(checksum(newid())%3)+1)), abs(checksum(newid())%3)+1
FROM sys.all_columns
)
INSERT @table
SELECT
CASE p
WHEN 1 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))
WHEN 2 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))
ELSE cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+
cast((abs(checksum(newid())%n)+1) AS varchar(4))
END
FROM parts;
--SELECT * FROM @table
DBCC TRACEON(2453); -- for better cardinality estimates on temp variables
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'MY SOLUTION: '+char(13)+char(10)+REPLICATE('-',50);
WITH trips AS
(
SELECT col, item=item+0
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',') s1
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
)
-- All combinations when there's 3
SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))
FROM trips a, trips b
WHERE a.col = b.col AND a.item < b.item
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))
UNION ALL
SELECT col
FROM @table
WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))
UNION
SELECT item
FROM @table
CROSS APPLY dbo.DelimitedSplit8K(col,',')
--ORDER BY item
PRINT 'RCTE:'+char(13)+char(10)+REPLICATE('-',50);
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ItemNumber,
Col = V.Item
from@table Y cross APPLY dbo.DelimitedSplit8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
SELECT DISTINCT Col
FROM R
--ORDER BY col;
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
DBCC TRACEOFF(2453);
Perf Test results:
MY SOLUTION:
--------------------------------------------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 31 ms, elapsed time = 32 ms.
Table '#A73A2F66'. Scan count 25, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 3458, logical reads 98032, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 547 ms, elapsed time = 427 ms.
RCTE:
--------------------------------------------------
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server parse and compile time:
CPU time = 16 ms, elapsed time = 21 ms.
Table 'Worktable'. Scan count 58811, logical reads 392249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#A73A2F66'. Scan count 3, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 1029 ms, elapsed time = 1193 ms.
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
You can see - even with a good splitter the recursive CTE solution is less than half as fast and produces 4X the number of reads.
-- Itzik Ben-Gan 2001
December 6, 2016 at 3:24 pm
N_Muller (11/30/2016)
Alan.B (11/29/2016)
My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.
edit: added some explanation.
I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:
declare @table table ( IndexID int identity(1,1), Col varchar(100) )
insert @table ( Col )
values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )
-- solution
;with Z as
(
selectRowID = Y.IndexID,
ColID = V.ID,
Col = V.ListValues
from@table Y cross apply
FN_ListToTable_8K ( Y.Col, ',' ) V
), W as
(
selectRowID,
NoCols = max(ColID)
fromZ
group by RowID
), R as
(
selectID = 1,
RowID,
Col,
LastCol = convert(int,Col)
fromZ
union all
selectID = R.ID + 1,
RowID = R.RowID,
Col = R.Col + ',' + Z.Col,
LastCol = convert(int,Z.Col)
fromW inner join
R on
R.RowID = W.RowID and
R.ID < W.NoCols inner join
Z on
Z.RowID = R.RowID and
convert(int,Z.Col) > R.LastCol
)
select distinct Col from R order by Col
-- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).
-- The second CTE gets the number of elements in each original RowID.
-- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.
Can you post a copy of the FN_ListToTable_8K function so that we can try out your code? Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2016 at 4:52 pm
Jeff Moden (12/6/2016)Can you post a copy of the FN_ListToTable_8K function so that we can try out your code? Thanks.
CREATE FUNCTION [dbo].[FN_ListToTable_8K]
(
@listVARCHAR(8000),
@delimiterCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
--
-- Thist function has had minimal if any modification on code
-- by Jeff Moden (SQL Server Central)
--
-- CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS
(
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
), --10E+1 or 10 rows
E2(N) AS
(
SELECT1
FROME1 a,
E1 b
),--10E+2 or 100 rows
E4(N) AS
(
SELECT1
FROME2 a,
E2 b
),--10E+4 or 10,000 rows
cteTally(N) AS
(
--
--This provides the "base" CTE and limits the number of rows right up front
--for both a performance gain and prevention of accidental "overruns"
--
SELECTTOP
(ISNULL(DATALENGTH(@list),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROME4
),
cteStart(N1) AS
(
--
--This returns N+1 (starting position of each "element" just once for each delimiter)
--
SELECT1
UNION ALL
SELECTt.N+1
FROMcteTally t
WHERESUBSTRING(@list,t.N,1) = @delimiter
),
cteLen(N1,L1) AS
(
--
--Return start and length (for use in substring)
--
SELECTs.N1,
ISNULL(NULLIF(CHARINDEX(@delimiter,@list,s.N1),0)-s.N1,8000)
FROMcteStart s
)
--
--Do the actual split. The ISNULL/NULLIF combo handles the length for the final
--element when no delimiter is found.
--
SELECTID= ROW_NUMBER() OVER(ORDER BY l.N1),
ListValues= SUBSTRING(@list, l.N1, l.L1)
FROMcteLen l
December 6, 2016 at 6:01 pm
N_Muller (12/6/2016)
Jeff Moden (12/6/2016)Can you post a copy of the FN_ListToTable_8K function so that we can try out your code? Thanks.
CREATE FUNCTION [dbo].[FN_ListToTable_8K]
(
@listVARCHAR(8000),
@delimiterCHAR(1)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
--
-- Thist function has had minimal if any modification on code
-- by Jeff Moden (SQL Server Central)
--
-- CTE Driven "Tally Table" produces values from 1 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS
(
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 1
), --10E+1 or 10 rows
E2(N) AS
(
SELECT1
FROME1 a,
E1 b
),--10E+2 or 100 rows
E4(N) AS
(
SELECT1
FROME2 a,
E2 b
),--10E+4 or 10,000 rows
cteTally(N) AS
(
--
--This provides the "base" CTE and limits the number of rows right up front
--for both a performance gain and prevention of accidental "overruns"
--
SELECTTOP
(ISNULL(DATALENGTH(@list),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROME4
),
cteStart(N1) AS
(
--
--This returns N+1 (starting position of each "element" just once for each delimiter)
--
SELECT1
UNION ALL
SELECTt.N+1
FROMcteTally t
WHERESUBSTRING(@list,t.N,1) = @delimiter
),
cteLen(N1,L1) AS
(
--
--Return start and length (for use in substring)
--
SELECTs.N1,
ISNULL(NULLIF(CHARINDEX(@delimiter,@list,s.N1),0)-s.N1,8000)
FROMcteStart s
)
--
--Do the actual split. The ISNULL/NULLIF combo handles the length for the final
--element when no delimiter is found.
--
SELECTID= ROW_NUMBER() OVER(ORDER BY l.N1),
ListValues= SUBSTRING(@list, l.N1, l.L1)
FROMcteLen l
I'm thinking that code is going to work pretty well for you. Thanks for the mention in the comments.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2016 at 6:03 pm
Shifting gears back to the problem at hand, will there ever be CSVs with more than 3 elements?
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2016 at 7:23 am
Jeff Moden (12/6/2016)
Shifting gears back to the problem at hand, will there ever be CSVs with more than 3 elements?
Yes, but the most I've seen in my scenarios is about 20.
December 7, 2016 at 7:24 am
Jeff Moden (12/6/2016)
I'm thinking that code is going to work pretty well for you.
Thanks for the mention in the comments.
I give credit, where credit is due
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy