|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 8:37 PM
Points: 2,370,
Visits: 3,252
|
|
mister.magoo (12/19/2012)
And the nice thing is this method works nicely on varchar(MAX)... I used 2000 rows of 2000 pairs of values to test : Duration_Avg Cpu_Avg Reads_Avg Writes_Avg RowCounts_Avg 106994.000000 98561 4196918 36806 4000000
Of course, I couldn't compare this to the other methods because they don't handle MAX in their current forms, but I did test them at just 200 rows/200 pairs per row and they were getting very slow. I think 107 seconds to split out 4 million rows is quite good though 
Very interesting and thanks for the Brad Schultz link.
I have a vague recollection of some recent discussion in Jeff's DelimitedSplit8K article that may have also touched on this approach. That discussion thread is pretty huge but it would be in the last 2-3 pages of it I think. Can't recall if it was suggesting to use this for VARCHAR(MAX) or not though. Maybe something else to investigate or maybe even something you'd like to add to the discussion.
No loops! No CURSORs! No RBAR! Hoo-uh!
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
Need to UNPIVOT? Why not CROSS APPLY VALUES instead? Since random numbers are too important to be left to chance, let's generate some! Are you too recursively challenged? Splitting strings based on patterns can be fast!
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 11:54 PM
Points: 33,113,
Visits: 27,041
|
|
mister.magoo (12/19/2012) Hi, I had a feeling that I had forgotten something important about the XML method...then I remembered...
N-i-i-i-i-i-c-c-c-c-e-e-e ! ! ! ! !
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 11:54 PM
Points: 33,113,
Visits: 27,041
|
|
I'm getting some conflicting results on the performance testing. A 10 year old single cpu desktop computer and a laptop probably aren't the best places for me to test on. I'll try to reserve a little time to test these on the server at work today (it's just after midnight here).
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Yesterday @ 8:37 PM
Points: 2,370,
Visits: 3,252
|
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Today @ 12:24 AM
Points: 1,337,
Visits: 4,025
|
|
Jeff Moden (12/19/2012) I'm getting some conflicting results on the performance testing. A 10 year old single cpu desktop computer and a laptop probably aren't the best places for me to test on. I'll try to reserve a little time to test these on the server at work today (it's just after midnight here).
Having read recently about the problems with using set statistics time on, I used a server side trace to measure when I was testing.
I also saw some strange results, mostly from Lynn's solution which, for some reason would sometimes have terrible elapsed time, but I took that to be environmental as it was seemingly random.
MM
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 11:54 PM
Points: 33,113,
Visits: 27,041
|
|
mister.magoo (12/20/2012)
Jeff Moden (12/19/2012) I'm getting some conflicting results on the performance testing. A 10 year old single cpu desktop computer and a laptop probably aren't the best places for me to test on. I'll try to reserve a little time to test these on the server at work today (it's just after midnight here).Having read recently about the problems with using set statistics time on, I used a server side trace to measure when I was testing. I also saw some strange results, mostly from Lynn's solution which, for some reason would sometimes have terrible elapsed time, but I took that to be environmental as it was seemingly random.
The other thing on this testing is that the data is vertically homogenous. All of the delimiters are in precisely the same spot on every row and we've seen where that does give certain splitting methods a seeming advantage.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Yesterday @ 9:51 AM
Points: 298,
Visits: 1,323
|
|
[EDIT FOR CLARIFICATION: When using the DelimitedSplit8K function it makes no sense to use it on strings longer than 8K and it will not perform as well as other methods if the datatype is VARCHAR(MAX). In the code posted below, make sure to change the datatypes to VARCHAR(8000) when the DelimitedSplit8K function is involved. Thanks to Jeff for pointing this out.]
OK...I came up with another variation of the XML splitter function. I also wanted to compare the various methods based on the size of the sample string to parse AND to compare methods when the size of the values in the value pairs was not constant. I wrote a function that created value pairs first as just numerics 0-9 and then one with alphanumerics. Both of these generated rows in which the value pairs were random and different for each row. I've posted the various functions and scripts below along with the time comparisons.
First test used the same data as generated in a post above using a Tally table CTE. I generated test data with 100 rows, 1,000 rows, and 10,000 rows.
IF OBJECT_ID('tempdb..#Strings') IS NOT NULL DROP TABLE #Strings
CREATE TABLE #Strings (ID INT IDENTITY, MyString VARCHAR(MAX))
;WITH Tally (n) AS ( SELECT TOP 100 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns a CROSS JOIN sys.all_columns b) INSERT INTO #Strings SELECT CAST(( SELECT CAST(n AS VARCHAR(5)) + ';' + CAST(40000+n AS VARCHAR(6)) + '|' FROM Tally FOR XML PATH('')) AS VARCHAR(MAX)) FROM Tally
The first row looked like this (and all the rows were the same): MyString 1;40001|2;40002|3;40003| [...] |997;40997|998;40998|999;40999|1000;41000|
Performance:
(100 row(s) affected) ****************************************************** DelimitedSplit8K (Lynn Pettis) SQL Server Execution Times: CPU time = 391 ms, elapsed time = 1129 ms.
PatternSplitCM (Dwain.C) SQL Server Execution Times: CPU time = 750 ms, elapsed time = 22446 ms.
XML (Mister Magoo) SQL Server Execution Times: CPU time = 421 ms, elapsed time = 437 ms.
DelimitedSplit_XML2 (Steven Willis) (Code for the function is below) SQL Server Execution Times: CPU time = 32 ms, elapsed time = 23 ms.
(1000 row(s) affected) ****************************************************** DelimitedSplit8K (Lynn Pettis) SQL Server Execution Times: (I gave up after 7 minutes)
PatternSplitCM (Dwain.C) SQL Server Execution Times: CPU time = 40157 ms, elapsed time = 334170 ms. XML (Mister Magoo) SQL Server Execution Times: CPU time = 39891 ms, elapsed time = 40201 ms.
DelimitedSplit_XML2 (Steven Willis) SQL Server Execution Times: CPU time = 1453 ms, elapsed time = 1573 ms.
10000 row(s) affected) ****************************************************** [Other methods all took more than 7 mins which was my "give up" threshold]
DelimitedSplit_XML2 (Steven Willis) SQL Server Execution Times: CPU time = 152500 ms, elapsed time = 229167 ms.
The second series of tests used a function to generate random data of different lengths for each item of each value pair and for each row. I generated test data with 100 rows and 1,000 rows. In this test, the results were quite different!
IF OBJECT_ID('tempdb..#Strings') IS NOT NULL DROP TABLE #Strings
CREATE TABLE #Strings (ID INT IDENTITY, MyString VARCHAR(MAX))
DECLARE @X INT SET @X = 1
WHILE @X <= 100 BEGIN
INSERT INTO #Strings SELECT CAST(( SELECT TOP(100) dbo.svfGenerateUniqueCode (0,ISNULL(NULLIF(FLOOR(CAST(RAND(t1.N*10000000) AS DECIMAL(18,4))*10),0),10),'') + ';' + dbo.svfGenerateUniqueCode (0,ISNULL(NULLIF(FLOOR(CAST(RAND((t1.N+100)*10000000) AS DECIMAL(18,4))*10),0),10),'') + '|' FROM dbo.Tally AS t1 FOR XML PATH('')) AS VARCHAR(MAX)) SET @X = @X+1 END
The test rows looked like this:
MyString (each row a different set of random numbers of variable length which will be different for every run)
ID MyString 1 5061;1552|4237254;407035|27;5|042;617152|6253541;336|3;6|234;646436|1004512557;704|... 2 4535;501356|5561164;47|100;2|0764;6|4041310;051|45;4|47;1621|3;605|2;2477|... 3 270301173;14|60;20751|44273;327607|2444234;663274273|2;404426|4;165211|...
Performance:
(100 row(s) affected) ****************************************************** DelimitedSplit8K (Lynn Pettis) SQL Server Execution Times: CPU time = 593 ms, elapsed time = 617 ms. PatternSplitCM (Dwain.C) SQL Server Execution Times: CPU time = 594 ms, elapsed time = 5392 ms. XML (Mister Magoo) SQL Server Execution Times: CPU time = 421 ms, elapsed time = 412 ms.
DelimitedSplit_XML2 (Steven Willis) SQL Server Execution Times: CPU time = 625 ms, elapsed time = 640 ms.
(1000 row(s) affected) ******************************************************
DelimitedSplit8K (Lynn Pettis) SQL Server Execution Times: CPU time = 4922 ms, elapsed time = 14138 ms.
PatternSplitCM (Dwain.C) SQL Server Execution Times: CPU time = 8156 ms, elapsed time = 172964 ms.
XML (Mister Magoo) SQL Server Execution Times: CPU time = 4312 ms, elapsed time = 4481 ms.
DelimitedSplit_XML2 (Steven Willis) SQL Server Execution Times: CPU time = 6281 ms, elapsed time = 6835 ms.
For random alphanumerics:
MyString (each row a different set of random alphanumerics of variable length which will be different for every run)
ID MyString 1 Uu6xS;e7Gd|cQm;J|DQ;BgU|F4;f|UqEqZx;fWK|E0k1WRan;h|Pz;1fd|... 2 fEsK;mUN20WCD|ehSK;SFNB5|YN5mgLY;QjAVec6|5;fq|R3z;4uUS|... 3 5hnr;6P|Yvqb6Pn;P3r|vW4q;h6G|uuaHXx7Fwq;5|QS;gs7|xgn7K;4cCebC|...
Performance:
(100 row(s) affected) ****************************************************** DelimitedSplit8K (Lynn Pettis) SQL Server Execution Times: CPU time = 640 ms, elapsed time = 371 ms.
PatternSplitCM (Dwain.C) SQL Server Execution Times: CPU time = 328 ms, elapsed time = 1523 ms.
XML (Mister Magoo) SQL Server Execution Times: CPU time = 454 ms, elapsed time = 466 ms.
DelimitedSplit_XML2 (Steven Willis) SQL Server Execution Times: CPU time = 578 ms, elapsed time = 585 ms.
(1000 row(s) affected) ****************************************************** DelimitedSplit8K (Lynn Pettis) SQL Server Execution Times: CPU time = 5125 ms, elapsed time = 14625 ms.
PatternSplitCM (Dwain.C) SQL Server Execution Times: CPU time = 4281 ms, elapsed time = 58210 ms.
XML (Mister Magoo) SQL Server Execution Times: CPU time = 4313 ms, elapsed time = 4305 ms.
DelimitedSplit_XML2 (Steven Willis) SQL Server Execution Times: CPU time = 5937 ms, elapsed time = 5955 ms.
The test code used to run each test:
DECLARE @Holder1 VARCHAR(MAX), @Holder2 VARCHAR(MAX)
PRINT 'DelimitedSplit8K (Lynn Pettis)' SET STATISTICS TIME ON SELECT @Holder1 = MAX(CASE ds2.ItemNumber WHEN 1 THEN ds2.Item ELSE '' END) , --as field1, @Holder2 = MAX(CASE ds2.ItemNumber WHEN 2 THEN ds2.Item ELSE '' END) --as field2 FROM #Strings CROSS APPLY dbo.DelimitedSplit8K(MyString,'|') ds1 CROSS APPLY dbo.DelimitedSplit8K(ds1.Item,';') ds2 WHERE ds1.Item <> '' GROUP BY ID ,ds1.ItemNumber ; SET STATISTICS TIME OFF
PRINT 'PatternSplitCM (Dwain.C)' SET STATISTICS TIME ON SELECT @Holder1 = MAX(CASE ItemNumber % 4 WHEN 1 THEN Item END) ,@Holder2 = MAX(CASE ItemNumber % 4 WHEN 3 THEN Item END) FROM #Strings CROSS APPLY PatternSplitCM(MyString,'[0-9]') WHERE [Matched] = 1 GROUP BY ID ,ItemNumber / 4 --OPTION (MAXDOP 1) SET STATISTICS TIME OFF
PRINT 'XML (Mister Magoo)' GO SET STATISTICS TIME ON GO
DECLARE @Holder1 VARCHAR(MAX), @Holder2 VARCHAR(MAX)
SELECT @Holder1 = nd.value('./@a','varchar(200)') ,@Holder2 = nd.value('./@b','varchar(200)') FROM ( SELECT CAST('<c a="' + REPLACE(REPLACE(MyString,'|','"/><c a="'),';','" b="') + '"/>' AS XML).query('.') FROM #Strings FOR XML PATH('') ,TYPE ) AS src (nds) CROSS APPLY nds.nodes('c') AS x (nd) WHERE nd.exist('./@b') = 1
GO
SET STATISTICS TIME OFF GO
PRINT 'DelimitedSplit_XML2 (Steven Willis)' GO SET STATISTICS TIME ON GO
DECLARE @Holder3 VARCHAR(MAX) ,@Holder4 VARCHAR(MAX)
SELECT @Holder3 = Col1 ,@Holder4 = Col2 FROM ( SELECT ID ,dsk1.Col1 ,dsk1.Col2 FROM #Strings a CROSS APPLY dbo.tvfDelimitedSplitXML2(MyString,'|',';') AS dsk1 ) r1 GO
SET STATISTICS TIME OFF GO
ANOTHER splitter function: 
/* This function will also split 1-dimensional arrays if Delimiter2 is left blank */
CREATE FUNCTION [dbo].[tvfDelimitedSplitXML2] ( @InputString VARCHAR(MAX) ,@Delimiter1 CHAR(1) ,@Delimiter2 CHAR(1) ) RETURNS @Elements TABLE ( ItemNumber INT IDENTITY(1,1) NOT NULL ,Col1 VARCHAR(MAX) NULL ,Col2 VARCHAR(MAX) NULL ,PRIMARY KEY (ItemNumber) ) WITH SCHEMABINDING AS BEGIN DECLARE @X VARCHAR(MAX) ,@XML XML ,@NumElements INT ,@Counter INT IF RIGHT(@InputString,1) = @Delimiter1 SET @InputString = LEFT(@InputString,LEN(@InputString)-1) IF LEFT(@InputString,1) = @Delimiter1 SET @InputString = RIGHT(@InputString,LEN(@InputString)-1)
IF NULLIF(@Delimiter2,'') IS NULL BEGIN -- Parse the 1-dimensional delimited string array SET @X = '<root><s>' + REPLACE(@InputString,@Delimiter1,'</s><s>')+'</s></root>'
-- Convert the string into XML SET @XML = CONVERT(XML,@X)
-- Select the rows from the array INSERT INTO @Elements (Col1) SELECT T.c.value('.','VARCHAR(MAX)') AS [Value] FROM @XML.nodes('/root/s') T (c) END ELSE BEGIN
-- Parse the 2-dimensional delimited string array
SET @X = '<s1>' + REPLACE(@InputString,@Delimiter1,'</s2><s1>')+'</s2>' SET @X = REPLACE(@X,@Delimiter2,'</s1><s2>') SET @X = '<root>'+@X+'</root>'
-- Convert the string into XML SET @XML = CONVERT(XML,@X)
-- Insert for each pair in the array
INSERT INTO @Elements SELECT Col1 ,Col2 FROM (SELECT ROW_NUMBER() OVER(ORDER BY Col1) AS RowID1 ,Col1 FROM (SELECT T.c.value('.','VARCHAR(MAX)') AS [Col1] FROM @XML.nodes('/root/s1') T (c)) AS Col1 ) AS Col1 INNER JOIN (SELECT ROW_NUMBER() OVER(ORDER BY Col2) AS RowID2 ,Col2 FROM (SELECT T.c.value('.','VARCHAR(MAX)') AS [Col2] FROM @XML.nodes('/root/s2') T (c)) AS Col2 ) AS Col2 ON Col1.RowID1 = Col2.RowID2 END RETURN /* Usage: SELECT * FROM dbo.tvfDelimitedSplitXML2( '30;38469|31;38470|32;38471|33;38472|34;38473|35;38474|36;38475|37;38476|38;38477|' ,'|' ,';' ) WHERE ItemNumber > 0 SELECT * FROM dbo.tvfDelimitedSplitXML2( 'Uu6xS;e7Gd|cQm;J|DQ;BgU|F4;f|UqEqZx;fWK|E0k1WRan;h|Pz;1fd|wjZchH;meYAb|w1U7uhj;rusCg' ,'|' ,';' ) WHERE ItemNumber > 0
SELECT * FROM dbo.tvfDelimitedSplitXML2( 'Element01,Element02,Element03,Element04,Element05' ,',','') WHERE ItemNumber > 0 */ END
Other code for reference:
--Used to pass 'NEWID()' into the function below
CREATE VIEW [dbo].[vwRandomGUID] AS SELECT NEWID() AS RandomGUID GO
--Generates random values
CREATE FUNCTION [dbo].[svfGenerateUniqueCode] ( @CodeMinLength INT ,@CodeMaxLength INT ,@SpecialChar VARCHAR(50) ) RETURNS VARCHAR(100) AS BEGIN
DECLARE @Code VARCHAR(100)
DECLARE @CodeData TABLE ( CodeChar VARCHAR(1) ) DECLARE @Num TABLE ( Digit INT NOT NULL PRIMARY KEY CLUSTERED ) IF @CodeMaxLength <= @CodeMinLength SET @CodeMaxLength = @CodeMinLength + 1
INSERT INTO @Num ( Digit ) SELECT Digit = 0 UNION ALL SELECT Digit = 1 UNION ALL SELECT Digit = 2 UNION ALL SELECT Digit = 3 UNION ALL SELECT Digit = 4 UNION ALL SELECT Digit = 5 UNION ALL SELECT Digit = 6 UNION ALL SELECT Digit = 7 UNION ALL SELECT Digit = 8 UNION ALL SELECT Digit = 9 ORDER BY 1
INSERT INTO @CodeData ( CodeChar ) SELECT CodeChar = SUBSTRING(b.Characters,a.RAND_INT % b.MOD,1) FROM ( SELECT aa.Number ,RAND_INT = ABS(CONVERT(INT,CONVERT(VARBINARY(100),(SELECT RandomGUID FROM dbo.vwRandomGUID)))) FROM ( SELECT Number = a.Digit + (b.Digit * 10) FROM @Num a CROSS JOIN @Num b ) aa ) a CROSS JOIN ( SELECT MOD = LEN(bb.Characters) - 1 ,bb.Characters FROM ( SELECT Characters = 'ABCDEFGHJKLMNPQURSUVWXYZ' + 'abcdefghjkmnpqursuvwxyz' + '0123456789' + @SpecialChar ) bb ) b ORDER BY (SELECT RandomGUID FROM dbo.vwRandomGUID)
SELECT @Code = ''
SELECT @Code = @Code + CodeChar FROM @CodeData
SELECT @Code = -- Random length from MIN to MAX Characters SUBSTRING(@Code,1,@CodeMinLength + (ABS(CONVERT(INT,CONVERT(VARBINARY(100),(SELECT RandomGUID FROM dbo.vwRandomGUID))))) % (@CodeMaxLength - @CodeMinLength + 1))
SET @Code = NULLIF(LTRIM(RTRIM(@Code)),'')
RETURN @Code
END GO
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 11:54 PM
Points: 33,113,
Visits: 27,041
|
|
Steven Willis (12/21/2012) First test used the same data as generated in a post above using a Tally table CTE. I generated test data with 100 rows, 1,000 rows, and 10,000 rows.
There is a major fault with this testing. It's a well advertised fact that the DelimitedSplit8K method doesn't work well with blobs. In fact, any method that internally joins to a blob at the character level is guaranteed to run much slower than other methods such a XML. That fact is even stated in the "Tally OH!" ariticle.
Methods like XML, however, are (with the understanding that I've not had the time to do the testing the code on this thread deservers) typically much slower than the DelimitedSplit8K method on datatypes of VARCHAR(8000) or less because of the required concatenations.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 11:54 PM
Points: 33,113,
Visits: 27,041
|
|
To wit, and it's been proven time and again, if you can use CLR on your server, a CLR splitter is going to be the best way to go on this. It will handle both VARCHAR and NVARCHAR without having to make any special adjustments, runs just fine against blob datatypes, and it runs a bit more than twice as fast as DelimitedSplit8K even when it's used properly.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 11:54 PM
Points: 33,113,
Visits: 27,041
|
|
murthyvs (12/11/2012) Hello all - I am having hard time to split an input string into multicolumn - multirows.
Task - Create a stored procedure that reads an input string with pre-defined field and row terminators; splits the string into multicolumn - multirows; and inserts records into a table.
So, a couple of questions remain to solve this for you properly...
1. Can you use CLR on your servers or not? 2. If not, what is the guranteed absolute maximum length of your delimited parameters?
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|