Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Split input string into multicolumn - multirows Expand / Collapse
Author
Message
Posted Tuesday, December 11, 2012 6:51 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, December 23, 2014 1:40 AM
Points: 3,443, Visits: 5,404
Here's another way:

DECLARE @MyString VARCHAR(8000) = 
'30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

SELECT col1=MAX(CASE ItemNumber%4 WHEN 1 THEN Item END)
,col2=MAX(CASE ItemNumber%4 WHEN 3 THEN Item END)
FROM PatternSplitCM(@MyString, '[0-9]')
WHERE [Matched] = 1
GROUP BY ItemNumber/4


PatternSplitCM is described here: http://www.sqlservercentral.com/articles/String+Manipulation/94365/

Not sure it will be faster than using DelimitedSplit8K but since you're calling that function twice, it might give it a run for its money.

Since this is posted in a SQL 2005 forum, you'll need to note that the Tally Table CTE used in PatternSplitCM in the article requires SQL 2008. However it is easily replaced by an Itzik Ben-Gan style CTE tally table like the one that appears (I think) in PatternSplitQU (also in the article).

Edit: Revised (simplified) my vector of attack slightly.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1395387
Posted Friday, December 14, 2012 6:00 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:26 PM
Points: 23, Visits: 56
Hi - the solution works as expected. Is there anyway to increase the size of the input string to 100,000. Its only limited to 8,000.
Thanks much!
Post #1396844
Posted Friday, December 14, 2012 8:31 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, December 23, 2014 1:40 AM
Points: 3,443, Visits: 5,404
Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.

I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1396857
Posted Friday, December 14, 2012 8:38 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:26 PM
Points: 23, Visits: 56
Yes, I have tried it. I have observed performance degradation. Query runs much slower with VARCHAR(MAX).
Post #1396858
Posted Friday, December 14, 2012 8:49 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:02 AM
Points: 35,841, Visits: 32,512
murthyvs (12/14/2012)
Hi - the solution works as expected. Is there anyway to increase the size of the input string to 100,000. Its only limited to 8,000.
Thanks much!


Unless your example is just a striking coincidence, I've seen such a thing before and I'd like to make a totally alternate suggestion if what I suspect is actually true.

It would appear that both fields in the string are sequential in their own right. If that is true (as it was with what I've seen before), then it will save you a huge amount of aggravation, the server a huge amount of processing time and I/O, your network an unnecessary blurb of 100K bytes for each usage, and the loss of feeling in your left arm when everyone jumps on you for resource usage, if you were to send just a starting value for each field and the number of field-pairs and let SQL very quickly generate even millions of pairs more quickly than you could ever transmit them.

For example, your example has fields that start with 30 and 38469 respectively and then both increment by 1 for 9 pairs of fields. Using the method I proposed above, you would pass only those 3 parameters over the pipe and then let SQL Server have at it. That would be capable of generating millions of rows if you needed it an it would happen a whole lot quicker than trying to send a million such field pairs over the pipe. For example, here code that does what I said o a million rows. I believe you'll find it to be remarkably quick.

--===== These would be parameters in a stored procedure
DECLARE @F1Start INT,
@F2Start INT,
@Pairs INT
;
SELECT @F1Start = 30,
@F2Start = 38469,
@Pairs = 1000000
;
--===== This would do the deed as previously described.
WITH
cteTally AS
(
SELECT TOP (@Pairs)
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1
FROM master.sys.all_columns ac1
CROSS JOIN master.sys.all_columns ac2
)
SELECT Field1 = @F1Start + N,
Field2 = @F2Start + N
INTO #Temp
FROM cteTally
;




--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1396859
Posted Friday, December 14, 2012 9:02 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:02 AM
Points: 35,841, Visits: 32,512
dwain.c (12/14/2012)
Jeff has said (he can correct me if I'm wrong) that DelimitedSplit8K is optimized for VARCHAR(8000) so change it to VARCHAR(MAX) with that in mind.

I think although I don't have rigid testing results that PatternSplitCM is not significantly affected if you change to VARCHAR(MAX).


It is, indeed, optimized for 8K. You also have to remember that VARCHAR(MAX) doesn't like to be joined to even by a Tally Table and will cause at least a 2 to 1 slowdown just by changing the function variables from VARCHAR(8000) to VARCHAR(MAX) even if the data stays under 8K.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1396861
Posted Friday, December 14, 2012 9:03 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:26 PM
Points: 23, Visits: 56
Hi Jeff - Sorry those things are not pairs/dont increment by one. I just made that example in excel.
Post #1396862
Posted Friday, December 14, 2012 9:05 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:26 PM
Points: 23, Visits: 56
Yes, DelimitedSplit8K is miraculously optimized for VARCHAR(8000). It runs very fast!
Post #1396864
Posted Friday, December 14, 2012 9:09 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:26 PM
Points: 23, Visits: 56
Hi - Quick question. Do you know why VARCHAR(MAX) doesnt work in this query?

It returns the following error:

Msg 240, Level 16, State 1, Line 110
Types don't match between the anchor and the recursive part in column "rowstrt" of recursive query "cte".

Thanks!

nigelrivett (12/11/2012)
This will do something like that.
It caters for any number of fields - just amend the final select for the number required

declare @s varchar(8000)
select @s = '30;38469;1|31;38470;1|32;38471;1|33;38472;1|34;38473;1|35;38474;1|36;38475;1|37;38476;1|38;38477;1|'
select @s = '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|'

declare @rowterm varchar(1) = '|'
declare @fieldterm varchar(1) = ';'

;with cte as
(
select rowstrt = 1, rowend = charindex(@rowterm,@s)-1, seq = 1
union all
select rowstrt = rowend+2, rowend = charindex(@rowterm,@s,rowend+2)-1, seq = seq + 1
from cte where charindex(@rowterm,@s,rowend+2)<>0
) ,
cte2 as
(
select s = substring(@s, rowstrt,rowend-rowstrt+1), seq
from cte
) ,
cte3 as
(
select fldstrt = 1, fldend = charindex(@fieldterm,s)-1, seq, s, fldseq = 1 from cte2
union all
select fldstrt = fldend+2,
fldend = case when charindex(@fieldterm,s,fldend+2) <> 0 then charindex(@fieldterm,s,fldend+2)-1 else len(s) end,
seq, s, fldseq = fldseq+1
from cte3 where fldend < len(s)
)
select s1.s, s2.s, s3.s
from
(select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 1) s1
join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 2) s2 on s1.seq = s2.seq
left join (select seq, s = substring(s,fldstrt,fldend-fldstrt+1) from cte3 where fldseq = 3) s3 on s1.seq = s3.seq
Post #1396865
Posted Friday, December 14, 2012 9:17 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 10:02 AM
Points: 35,841, Visits: 32,512
Yes... both variables and any/all string literals must be defined as VARCHAR(MAX) including the ones that hold the single character delimiters. It's one of the great joys of recursive CTE's.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1396866
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse