Home Forums SQL Server 2008 T-SQL (SS2K8) Split list of string into all possible combinations RE: Split list of string into all possible combinations

  • N_Muller (11/30/2016)


    Alan.B (11/29/2016)


    My solution can be totally optimized and simplified but here's a solution that uses DelimitedSplit8K (see the link in my signature):

    You'll notice three UNION ALL's. The first query after the CTE gets all combinations when there's 3 values. The second gets me the un-split three-value values. The third one gets me the un-split two-value values and the final query is the single values split out from all the delimited and un-delimited strings.

    edit: added some explanation.

    I actually ended up finding a different solution. Not sure it's faster, but it works nicely. First, I added an identity column to my table variable, but that's not required. The query is as follows:

    declare @table table ( IndexID int identity(1,1), Col varchar(100) )

    insert @table ( Col )

    values ( '173' ), ( '97' ), ( '54,115' ), ( '1,32,173' ), ( '21,32' ), ( '5,32' ), ( '32,173' )

    -- solution

    ;with Z as

    (

    selectRowID = Y.IndexID,

    ColID = V.ID,

    Col = V.ListValues

    from@table Y cross apply

    FN_ListToTable_8K ( Y.Col, ',' ) V

    ), W as

    (

    selectRowID,

    NoCols = max(ColID)

    fromZ

    group by RowID

    ), R as

    (

    selectID = 1,

    RowID,

    Col,

    LastCol = convert(int,Col)

    fromZ

    union all

    selectID = R.ID + 1,

    RowID = R.RowID,

    Col = R.Col + ',' + Z.Col,

    LastCol = convert(int,Z.Col)

    fromW inner join

    R on

    R.RowID = W.RowID and

    R.ID < W.NoCols inner join

    Z on

    Z.RowID = R.RowID and

    convert(int,Z.Col) > R.LastCol

    )

    select distinct Col from R order by Col

    -- The first CTE transforms the strings into rows while keeping the reference to the original RowID (IndexID in this case).

    -- The second CTE gets the number of elements in each original RowID.

    -- The third is a recursive CTE where I keep track of the last element added to the comma delimited list (largest value) and loops for the largest number of elements in each original RowID.

    First, I noticed an issue with my solution where I was getting duplicates, here's an updated version:

    declare @table table (IndexID int identity(1,1), Col varchar(100));

    insert @table (Col)

    values ('173'), ('97'), ('54,115' ), ('1,32,173'), ('21,32'), ('5,32'), ('32,173');

    WITH trips AS

    (

    SELECT col, item=item+0

    FROM @table

    CROSS APPLY dbo.DelimitedSplit8K(col,',') s1

    WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))

    )

    -- All combinations when there's 3

    SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))

    FROM trips a, trips b

    WHERE a.col = b.col AND a.item < b.item

    UNION ALL

    SELECT col

    FROM @table

    WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))

    UNION ALL

    SELECT col

    FROM @table

    WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))

    UNION

    SELECT item

    FROM @table

    CROSS APPLY dbo.DelimitedSplit8K(col,',')

    ORDER BY item; -- not required.

    Next, if FN_ListToTable_8K is not an Inline Table Valued function then you need to change that. A multi-line Table Valued splitter function will KILL your performance. The solution you are using includes a recursive CTE - those can be costly too. Here's a performance test to compare techniques:

    SET NOCOUNT ON;

    -- Sample data:

    declare @table table (IndexID int identity(1,1), Col varchar(100));

    WITH parts(n,p) AS

    (

    SELECT TOP (10000)

    '1'+REPLICATE('0',(abs(checksum(newid())%3)+1)), abs(checksum(newid())%3)+1

    FROM sys.all_columns

    )

    INSERT @table

    SELECT

    CASE p

    WHEN 1 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))

    WHEN 2 THEN cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+

    cast((abs(checksum(newid())%n)+1) AS varchar(4))

    ELSE cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+

    cast((abs(checksum(newid())%n)+1) AS varchar(4))+','+

    cast((abs(checksum(newid())%n)+1) AS varchar(4))

    END

    FROM parts;

    --SELECT * FROM @table

    DBCC TRACEON(2453); -- for better cardinality estimates on temp variables

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    PRINT 'MY SOLUTION: '+char(13)+char(10)+REPLICATE('-',50);

    WITH trips AS

    (

    SELECT col, item=item+0

    FROM @table

    CROSS APPLY dbo.DelimitedSplit8K(col,',') s1

    WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))

    )

    -- All combinations when there's 3

    SELECT item = CAST(a.item AS varchar(10))+ ',' +CAST(b.item AS varchar(10))

    FROM trips a, trips b

    WHERE a.col = b.col AND a.item < b.item

    UNION ALL

    SELECT col

    FROM @table

    WHERE LEN(col)-2 = LEN(REPLACE(col,',',''))

    UNION ALL

    SELECT col

    FROM @table

    WHERE LEN(col)-1 = LEN(REPLACE(col,',',''))

    UNION

    SELECT item

    FROM @table

    CROSS APPLY dbo.DelimitedSplit8K(col,',')

    --ORDER BY item

    PRINT 'RCTE:'+char(13)+char(10)+REPLICATE('-',50);

    ;with Z as

    (

    selectRowID = Y.IndexID,

    ColID = V.ItemNumber,

    Col = V.Item

    from@table Y cross APPLY dbo.DelimitedSplit8K ( Y.Col, ',' ) V

    ), W as

    (

    selectRowID,

    NoCols = max(ColID)

    fromZ

    group by RowID

    ), R as

    (

    selectID = 1,

    RowID,

    Col,

    LastCol = convert(int,Col)

    fromZ

    union all

    selectID = R.ID + 1,

    RowID = R.RowID,

    Col = R.Col + ',' + Z.Col,

    LastCol = convert(int,Z.Col)

    fromW inner join

    R on

    R.RowID = W.RowID and

    R.ID < W.NoCols inner join

    Z on

    Z.RowID = R.RowID and

    convert(int,Z.Col) > R.LastCol

    )

    SELECT DISTINCT Col

    FROM R

    --ORDER BY col;

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    DBCC TRACEOFF(2453);

    Perf Test results:

    MY SOLUTION:

    --------------------------------------------------

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 31 ms, elapsed time = 32 ms.

    Table '#A73A2F66'. Scan count 25, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 3458, logical reads 98032, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 547 ms, elapsed time = 427 ms.

    RCTE:

    --------------------------------------------------

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server parse and compile time:

    CPU time = 16 ms, elapsed time = 21 ms.

    Table 'Worktable'. Scan count 58811, logical reads 392249, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#A73A2F66'. Scan count 3, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1029 ms, elapsed time = 1193 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    You can see - even with a good splitter the recursive CTE solution is less than half as fast and produces 4X the number of reads.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001