SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Double Delimited String Parsing and Re-Concatenation Help


Double Delimited String Parsing and Re-Concatenation Help

Author
Message
pmcpherson
pmcpherson
SSC Veteran
SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)

Group: General Forum Members
Points: 219 Visits: 478
I have a table with a column of double delimited strings. I need to break it out into all possible combinations without repeats and without repeats from one outer group to another. For example:

CREATE TABLE #TmpTbl (TblID INT PRIMARY KEY, MultiDelimStr VARCHAR(255))
INSERT #TmpTbl ( TblID, MultiDelimStr )
SELECT 101, 'Orange,Apple;Orange,Grape,Banana;Apple,Kiwi,Grape'
UNION
SELECT 202, 'Grape,Apple;Orange,Grape,Banana;Peach,Kiwi,Grape'
UNION
SELECT 303, 'Orange;Peach,Grape,Banana;Apple,Kiwi;Tangerine,Grapefruit'


From this I need the following result set:

101 Orange,Grape,Apple
101 Orange,Grape,Kwi
101 Orange,Banana,Apple
101 Orange,Banana,Kiwi
101 Orange,Banana,Grape
101 Apple,Orange,Kiwi
101 Apple,Orange,Grape
101 Apple,Grape,Kiwi
101 Apple,Banana,Kiwi
101 Apple,Banana,Grape
202 Grape,Orange,Peach
202 Grape,Orange,Kiwi
202 Grape,Banana,Peach
202 Grape,Banana,Kiwi
202 Apple,Orange,Peach
202 Apple,Orange,Kiwi
202 Apple,Orange,Grape
202 Apple,Grape,Peach
202 Apple,Grape,Kiwi
202 Apple,Banana,Grape
202 Apple,Banana,Peach
202 Apple,Banana,Kiwi
303 Orange,Peach,Apple,Tangerine
303 Orange,Peach,Apple,Grapefruit
303 Orange,Peach,Kiwi,Tangerine
303 Orange,Peach,Kiwi,Grapefruit
303 Orange,Grape,Apple,Tangerine
303 Orange,Grape,Apple,Grapefruit
303 Orange,Grape,Kiwi,Tangerine
303 Orange,Grape,Kiwi,Grapefruit
303 Orange,Banana,Apple,Tangerine
303 Orange,Banana,Apple,Grapefruit
303 Orange,Banana,Kiwi,Tangerine
303 Orange,Banana,Kiwi,Grapefruit

But the following row would be incorrect:

101 Apple,Orange,Apple


Please, also note that I can count on at least 1 semicolon or up to 4, but no more than 4 per record. I don’t want to separate the rows with 1 semicolon and process them before moving on to the rows with 2 semicolons, etc.

I thought a double parse routine would get me close to where I want to be, but I can’t figure out how to put it back together again in the order my requirements desire. To break it out I have used:

SELECT FruitID, MultiDelimStr
,RANK() OVER(PARTITION BY FruitID ORDER BY N) AS GroupID
,SUBSTRING(';'+MultiDelimStr+';',N+1,CHARINDEX(';',';'+MultiDelimStr+';',N+1)-N-1) AS GroupFruit
INTO #FirstParse
FROM #FruitTbl F
CROSS JOIN dbo.Tally T
Where T.N < LEN(';'+MultiDelimStr+';')
AND SUBSTRING(';'+MultiDelimStr+';',N,1) = ';'
--SELECT * FROM #FirstParse

SELECT FruitID, GroupID
,SUBSTRING(','+GroupFruit+',',N+1,CHARINDEX(',',','+GroupFruit+',',N+1)-N-1) AS SoloFruit
INTO #SecondParse
FROM #FirstParse F
CROSS JOIN dbo.Tally T
Where T.N < LEN(','+GroupFruit+',')
AND SUBSTRING(','+GroupFruit+',',N,1) = ','
SELECT * FROM #SecondParse


To get the result set (just showing 101 because this is already long enough):

FruitID GroupID SoloFruit
101 1 Orange
101 1 Apple
101 2 Orange
101 2 Grape
101 2 Banana
101 3 Apple
101 3 Kiwi
101 3 Grape

Now how do I put it back together and achieve the desired result set without multiple IF statements and multiple sets of subqueries or a loop (cursor/while)? Or is there a better way than splitting it twice? I know I am missing something that I fear I have read before, but I just can't remember it Crazy XML?

Thank you very much for your time and advice in advance.:-)
Gianluca Sartori
Gianluca Sartori
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47957 Visits: 13367
I'm not sure I understand all your requirements.
Why ID = 101 has 3 combinations? There are 4 fruits there...

Anyway this is what I could come up with. It could be a starter.



IF OBJECT_ID('Tempdb..#TmpTbl') IS NOT NULL DROP TABLE #TmpTbl

CREATE TABLE #TmpTbl (TblID INT PRIMARY KEY, MultiDelimStr VARCHAR(255))
INSERT #TmpTbl ( TblID, MultiDelimStr )
SELECT 101, 'Orange,Apple;Orange,Grape,Banana;Apple,Kiwi,Grape'
UNION
SELECT 202, 'Grape,Apple;Orange,Grape,Banana;Peach,Kiwi,Grape'
UNION
SELECT 303, 'Orange;Peach,Grape,Banana;Apple,Kiwi;Tangerine,Grapefruit'


;WITH Pass1 AS (
SELECT TblId, C.Value
FROM #TmpTbl AS A
CROSS APPLY dbo.fSplit(MultiDelimStr, ';') AS B
CROSS APPLY dbo.fSplit(B.Value, ',') AS C
),
Pass2 AS (
SELECT *
FROM (
SELECT DISTINCT *
FROM Pass1
) AS A
),
Ids AS (
SELECT DISTINCT TblId
FROM #TmpTbl
)
SELECT A.TblId, B.*
FROM Pass2 AS A
CROSS APPLY (
SELECT CStr = A.Value + (
SELECT ',' + Value AS [text()]
FROM Pass2 AS P1
WHERE P1.TblId = A.TblId
AND P1.Value <> A.Value
ORDER BY Value
FOR XML PATH('')
)
)AS B



This code uses a split function by Jeff Moden that can be found here: http://www.sqlservercentral.com/articles/T-SQL/62867/

I'm including the function here:


CREATE FUNCTION [dbo].[fSplit]
(
@Parameter VARCHAR(8000),
@SplitOn char(1)
)
RETURNS @Elements TABLE
(

ID INT identity(1,1),
Value VARCHAR(8000)
)
AS
BEGIN
--===== Add start and end commas to the Parameter so we can handle
-- single elements
SET @Parameter = @splitOn + @Parameter + @SplitOn

--===== Join the Tally table to the string at the character level and
-- when we find a comma, insert what's between that command and
-- the next comma into the Elements table
INSERT INTO @Elements (Value)
SELECT SUBSTRING(@Parameter,N+1,CHARINDEX(@splitOn,@Parameter,N+1)-N-1)
FROM dbo.Tally
WHERE N < LEN(@Parameter)
AND SUBSTRING(@Parameter,N,1) = @splitOn --Notice how we find the comma
RETURN
END



I'm sure this is not what you're after...
Good luck anyway

--Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
Gianluca Sartori
Gianluca Sartori
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47957 Visits: 13367
Don't forget to create the Tally table from the article code if you're using the split function!

Good luck with your query!

--Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
pmcpherson
pmcpherson
SSC Veteran
SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)

Group: General Forum Members
Points: 219 Visits: 478

I'm not sure I understand all your requirements.
Why ID = 101 has 3 combinations? There are 4 fruits there...

I am not sure I understand your question. ID 101 should have 10 combinations.
Group1 has 2fruits. Group2 has 3 fruits. Group3 has 3 fruits. Group1 shares 1 fruit with Group2 and 1 different fruit with Group3. Group2 shares 1 fruit with Group3. Since Apple, Orange, Apple is not allowed, you can't count the number of items per group and the number of groups and get your total number of combinations using simple math. You have to know the repeats. This leaves the 10 combinations I list above in the original post.

Thank you very much for the function and CTEs, I knew that I had seen something like that before. I will test against what I have and let you know if it does indeed answer my question.
pmcpherson
pmcpherson
SSC Veteran
SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)

Group: General Forum Members
Points: 219 Visits: 478
Gianluca,
Your solution, while helpful, does not return the desired result set.
I think I see the confusion in my explanation of the problem.
The problem is to come up with all of the unique GROUP combinations.
The result set must have one fruit from Group1 and one fruit from Group2 ... 1 fruit from GroupN.

So if the string field has 1 ';' then there are two groups; Group1 and Group2 (left and right of the ';'). The output should be one fruit from Group1 plus a ',' plus one fruit from Group2. More specifically, all the possible single-fruit-per-group combinations that keep Group1 fruits on the left and Group2 fruits on the right. With no fruit simultaneously in Group1 and Group2 spots during output.

So if the string field has 2 ';' then there are three groups; Group1, Group2, Group3. Group1's fruits are to the left of the first ';' and Group3's are to the right of the last ';' and Group2's are in the the middle. Then re-concatenate them such that all the possible single-fruit-per-group combinations that keep Group1 fruits on the left and Group2 fruits in the middle and Group3 fruits on the right. With no fruit simultaneously in Group1, Group2 and/or Group3 spots during output.

Etc.

In my original post I list out all the valid combinations for output. That is the format that is desired. I hope this helps clear things up and expedites aid.

Thank you all again for taking a look at this.
Gianluca Sartori
Gianluca Sartori
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47957 Visits: 13367
I must say I'm quite disappointed I could not solve this problem in a simple way.
Maybe It's something beyond my skills.

Anyway this is the best I could put together:


IF OBJECT_ID('Tempdb..#TmpTbl') IS NOT NULL DROP TABLE #TmpTbl

CREATE TABLE #TmpTbl (TblID INT PRIMARY KEY, MultiDelimStr VARCHAR(255))
INSERT #TmpTbl ( TblID, MultiDelimStr )
SELECT 101, 'Orange,Apple;Orange,Grape,Banana;Apple,Kiwi,Grape'
UNION
SELECT 202, 'Grape,Apple;Orange,Grape,Banana;Peach,Kiwi,Grape'
UNION
SELECT 303, 'Orange;Peach,Grape,Banana;Apple,Kiwi;Tangerine,Grapefruit'


;WITH Splits AS (
SELECT TblId, GRP_ID = B.Id, C.Value AS Fruit, C.Id AS Fruit_id, NumGroups = MAX(B.Id) OVER(PARTITION BY TblId)
FROM #TmpTbl AS A
CROSS APPLY dbo.fSplit(MultiDelimStr, ';') AS B
CROSS APPLY dbo.fSplit(B.Value, ',') AS C
),
ThreeRows AS (
SELECT N
FROM Tally
WHERE N <= 3
),
FourRows AS (
SELECT N
FROM Tally
WHERE N <= 4
),
CrossThree AS (
SELECT A.N AS [1], B.N AS [2], C.N AS [3]
FROM ThreeRows AS A
CROSS JOIN ThreeRows AS B
CROSS JOIN ThreeRows AS C
),
CrossFour AS (
SELECT A.N AS [1], B.N AS [2], C.N AS [3], D.N AS [4]
FROM FourRows AS A
CROSS JOIN ThreeRows AS B
CROSS JOIN ThreeRows AS C
CROSS JOIN ThreeRows AS D
)
SELECT DISTINCT B.TblId, B.Fruit + ',' + C.Fruit + ',' + D.Fruit
FROM CrossThree AS A
INNER JOIN Splits AS B
ON B.GRP_ID = 1
AND B.Fruit_ID = [1]
INNER JOIN Splits AS C
ON C.GRP_ID = 2
AND C.TblId = B.TblId
AND C.Fruit_ID = [2]
AND C.Fruit <> B.Fruit
INNER JOIN Splits AS D
ON D.GRP_ID = 3
AND D.TblId = C.TblId
AND D.Fruit_ID = [3]
AND D.Fruit <> B.Fruit
AND D.Fruit <> C.Fruit
WHERE B.NumGroups = 3
UNION ALL
SELECT DISTINCT B.TblId, B.Fruit + ',' + C.Fruit + ',' + D.Fruit + ',' + E.Fruit
FROM CrossFour AS A
INNER JOIN Splits AS B
ON B.GRP_ID = 1
AND B.Fruit_ID = [1]
INNER JOIN Splits AS C
ON C.GRP_ID = 2
AND C.TblId = B.TblId
AND C.Fruit_ID = [2]
AND C.Fruit <> B.Fruit
INNER JOIN Splits AS D
ON D.GRP_ID = 3
AND D.TblId = C.TblId
AND D.Fruit_ID = [3]
AND D.Fruit <> B.Fruit
AND D.Fruit <> C.Fruit
INNER JOIN Splits AS E
ON E.GRP_ID = 4
AND E.TblId = D.TblId
AND E.Fruit_ID = [4]
AND E.Fruit <> B.Fruit
AND E.Fruit <> C.Fruit
AND E.Fruit <> D.Fruit
WHERE B.NumGroups = 4



Basically the code builds two work tables (CrossThree and CrossFour) with "N" columns, holding all N^N possible combinations of values between 1 and "N".
Once the work tables are in place, I join them back to the splitted strings, choosing the right worktable looking at the number of groups.
The main thing I don't like about this code is that it relies on fixed numbers of groups, so that it should be generated dynamically after a first pass on the input data to gather the minimum/maximum number of groups.

I think this problem could be solved much better with a CLR aggregate, but it's a technique I don't master.

I hope someone can help with something better than this.
Gianluca

--Gianluca Sartori

How to post T-SQL questions
spaghettidba.com
@spaghettidba
pmcpherson
pmcpherson
SSC Veteran
SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)

Group: General Forum Members
Points: 219 Visits: 478
Your solution is similar to mine with more joins than I believe to be efficient, but still effective.

I am also hoping someone else can provide a cleaner, simpler, more efficient way to solve this challenge.

I do appreciate the time you put into this; thanks! :-D
Chad Crawford
 Chad Crawford
SSCertifiable
SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)

Group: General Forum Members
Points: 6501 Visits: 18732
I was wondering if you had control over the schema and could change it to have all the data broken out into different tables? Gianluca did that with his CTE, and with that something like this works:

;WITH Splits AS (
SELECT TblId, GRP_ID = B.Id, C.Value AS Fruit
FROM #TmpTbl AS A
CROSS APPLY dbo.fSplit(MultiDelimStr, ';') AS B
CROSS APPLY dbo.fSplit(B.Value, ',') AS C
)
SELECT one.tblID, one.Fruit + ',' + two.Fruit + ISNULL(',' + three.Fruit, '') + ISNULL(',' + four.Fruit, '') + ISNULL(',' + five.Fruit, '')
FROM Splits one
LEFT JOIN Splits two on (one.tblID = two.tblID AND two.GRP_ID = 2)
LEFT JOIN Splits three on (one.tblID = three.tblID AND three.GRP_ID = 3)
LEFT JOIN Splits four on (one.tblID = four.tblID AND four.GRP_ID = 4)
LEFT JOIN Splits five on (one.tblID = five.tblID AND five.GRP_ID = 5)
WHERE ISNULL(one.Fruit,'') != ISNULL(two.Fruit,'*')
and ISNULL(one.Fruit,'') != ISNULL(three.Fruit,'*')
and ISNULL(one.Fruit,'') != ISNULL(four.Fruit,'*')
and ISNULL(one.Fruit,'') != ISNULL(five.Fruit,'*')
and ISNULL(two.Fruit,'') != ISNULL(three.Fruit,'*')
and ISNULL(two.Fruit,'') != ISNULL(four.Fruit,'*')
and ISNULL(two.Fruit,'') != ISNULL(five.Fruit,'*')
and ISNULL(three.Fruit,'') != ISNULL(four.Fruit,'*')
and ISNULL(three.Fruit,'') != ISNULL(five.Fruit,'*')
and ISNULL(four.Fruit,'') != ISNULL(five.Fruit,'*')
AND one.GRP_ID = 1



It's basically just a table join making sure you don't get duplicates back. Ugly, but it worked with the sample data.

Chad
pmcpherson
pmcpherson
SSC Veteran
SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)SSC Veteran (219 reputation)

Group: General Forum Members
Points: 219 Visits: 478
I was wondering if you had control over the schema and could change it to have all the data broken out into different tables?

Doesn't every DBA/D want control over that? LOL And no, I do not.
Ugly, but it worked with the sample data.

Yeah, but do you think it is possible to do this more "Swan" and less "Duckling?" Is there a better way to start? (other than changing the schema, of course;-))
What is the performance hit of joining the same table 5 times going to be when I try to do 100,000 of these at once?
Thank you for your time and help.
Chad Crawford
 Chad Crawford
SSCertifiable
SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)SSCertifiable (6.5K reputation)

Group: General Forum Members
Points: 6501 Visits: 18732
Well... not any answers, but some more questions and thoughts that might spark a new idea:

When you query out for these, you are doing it based on the TBLID, right? You aren't looking for everything from everywhere each time? With that in mind, you could target those rows with a where clause in the CTE and that would make a huge difference with that column indexed. I wouldn't think that would be too bad at all.

What do the insert/updates against this table look like - are they frequent and constant? I hate recommending a trigger because of the overhead they represent, but if inserts, updates and deletes are infrequent, you could use a trigger to mirror the structure in 5 seperate tables and with TBLID indexed the performance shouldn't be too bad. Or, you could just prebuild all the information on the fly from the trigger in the first place.

I know you don't have control over the schema, but if you could take some metrics to someone (I'm just making this up, but "If we change the schema we'll go from 5 minute queries to sub-second queries" or whatever it turns out to be), would they consider making some changes? If you use some sample data and an environment similar to what you have in production, can you see what difference a new schema might make?

Do you have enough control over the schema to change the content of the string? I'm not the best at XML, but if the string was in an XML format, there might (I don't know) be some new opportunities that don't involve all the joins and loops and just leverage the XML.

What is the consuming target for the data? Is it a report, or is this an input to another algorithm? SQL doesn't have a full complement of string tools, would it be faster to offload some of the work to the next layer?

Do you need real-time access to the data? If you prebuilt a report table on a semi-regular basis so that all the crunching was done once and all the reporting/consuming was done off a slightly out-of-date copy, would it be satisfactory?

And I guess the other question is do you have an environment to test a real-sized load to see what it would do?

Thanks,
Chad
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum







































































































































































SQLServerCentral


Search