|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:33 AM
Points: 10,989,
Visits: 10,529
|
|
Usman Butt (9/12/2012)
hahahaha..  Yes, one can change it since the source code is available. But since it is to be taken as the generic splitter  , it should be handling strings with any length of elements  Should say we are comparing oranges with oranges when it comes to varchar(max) versions  I know - and it has that annoying limitation of only accepting a maximum of 2GB of string as input too! 
But on a serious note, is there any performance degrade if nvarchar(max) is used? I do not remember the link, but I read somewhere that sometimes nvarchar(max) could speed up the execution as compared to nvarchar(N) but may depend upon the logic of the code. BTW, In my initial testing, I did see some improved performance if I change the return type to let say nvarchar(20) etc. I have never tried it. There quite probably would be since handling LOB types always incurs overhead, a little in the CLR and a great deal more in the SQL Server engine. Improving performance by shortening the maximum length of the output from nvarchar(4000) to nvarchar(20) surprises me. I can't think of a reason that would affect anything. On the general topic of increasing the speed of the CLR spliter - no, I haven't tried much on that at all. It has always been the fastest method for me, and I have never seen it be the performance bottleneck for a real problem I have had. I also value keeping the implementation nice and simple.
Paul White SQL Server MVP SQLblog.com @SQL_Kiwi
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Thursday, April 11, 2013 11:02 PM
Points: 11,
Visits: 146
|
|
I was thinking that there would be a separate version for ASCII which would avoid the conversions to and from UNICODE and probably make it even faster. The convenience of having a single function counts for a lot given the speed is already good.
SQL Kiwi (9/11/2012)
m.t.cleary (9/11/2012) Can you avoid the conversion to Unicode by casting to a byte array?Yes, but it all gets very inconvenient from there on in - for example working out whether the byte stream was originally Unicode or not.
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Thursday, April 11, 2013 11:02 PM
Points: 11,
Visits: 146
|
|
I found similar failure to proceed on a physical server - a desktop running Vista 32 bit on an aging AMD processor. When I changed the script to select into local variables instead of inserting into temp tables, it completed in good time. There is probably a problem with tempdb allocation or expansion.
This is OK for micro-bencmarks but if you want a realistic test you should use the splitter output rather than discarding it. If you can't fix tempdb, I'd change the script to delete the temporary tables as you go or to just run one splitter variant at a time.
-mark
ErikEckhardt (3/19/2012) Second try: failed. 3 days and 20-some-odd hours later, exact same results as before, only with wait time 334,605,998. :)
I think I will not be running this again on my VM SQL instance. That's okay, there are plenty other places I can try it...
The entire SQL Server could not be used for any other purpose for 2 days. I have something logging to a table every minute and it could not operate.
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Friday, October 05, 2012 6:12 AM
Points: 1,
Visits: 1
|
|
I had an online csv splitter made that can be used on any platform including mac...
[url=http://www.promotional-products.org/automatic-online-csv-file-splitter.html][/url]
meh won't link...
http://www.promotional-products.org/automatic-online-csv-file-splitter.html
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 5:33 PM
Points: 32,902,
Visits: 26,783
|
|
david 97103 (10/5/2012) I had an online csv splitter made that can be used on any platform including mac...
[url=http://www.promotional-products.org/automatic-online-csv-file-splitter.html][/url]
meh won't link...
http://www.promotional-products.org/automatic-online-csv-file-splitter.html
That's nice and thanks for the freebee but not exactly what we're talking about here. That splits the files into multiple files. It doesn't actually split the rows into elements and insert them into a database table.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, April 10, 2013 4:32 PM
Points: 9,
Visits: 45
|
|
Thanks, Jeff!
I was inspired by this to make a CSV Splitter that also respects double-quotes, such as 10,"20,30",40 so it doesn't use the comma inside quotes. Style note - commas are before the cte names (instead of after) so there is room to insert a testing statement for seeing the results of an intermediate cte by uncommenting and running just that section.
IF object_id('dbo.parseCSV') IS NOT NULL DROP FUNCTION dbo.parseCSV GO --parseCSV.sql Derived from: http://www.sqlservercentral.com/articles/Tally+Table/72993/ /* We assume input data is comma-separated values with no quotes EXCEPT for fields completely surrounded by quotes. That is, the opening quote is either column 1 or just after a comma, and the closing quote is either end-of-string or just before a comma. Usage Example:
SELECT * FROM dbo.parseCSV('Fat,"10,20,30",and,Lazy') --*/ CREATE FUNCTION dbo.parseCSV (@pString VARCHAR(8000)) --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE! RETURNS TABLE WITH SCHEMABINDING AS RETURN --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000... -- enough to cover VARCHAR(8000) --You can test by executing from this Declare to the end or an uncommented SELECT after a cte definition. --declare @pString varchar(1000)='1,"4,6",9'; set @pString='a1,",xb,",c1';--set @pString='Fat,"10,20,30",and,Lazy';--set @pString='a,b,"c,d"';--set @pString='"1,2","3,4"'; WITH E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ) --10E+1 or 10 rows ,E2(N) AS (SELECT 1 FROM E1 a, E1 b) --10E+2 or 100 rows ,E4(N) AS (SELECT 1 FROM E2 a, E2 b) --10E+4 or 10,000 rows max ,cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) ,cteQtCom AS (--find all quotes and commas SELECT t.N , CASE WHEN SUBSTRING(@pString,t.N,1)=',' THEN 1 ELSE 0 END [isComma] , CASE WHEN SUBSTRING(@pString,t.N,1)='"' THEN 1 ELSE 0 END [isQuote] FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) IN (',','"') )--cteQtCom --SELECT * FROM cteQtCom ,cteNumQuotesBefore AS ( SELECT t.N, t.isComma, t.isQuote , ISNULL((SELECT SUM(t2.isQuote) FROM cteQtCom t2 WHERE t2.N<t.N),0) [qb4] FROM cteQtCom t )--cteNumQuotesBefore --SELECT * FROM cteNumQuotesBefore ,cteCommasNotWithinQuotes AS (--if the number of quotes before is EVEN then this comma is not inside quotes. SELECT t.N FROM cteNumQuotesBefore t WHERE t.isComma=1 AND (qb4%2)=0 )--cteCommasNotWithinQuotes --SELECT *, SUBSTRING(@pString,1,N)[substr] FROM cteCommasNotWithinQuotes ,cteStart(N1,this_startquote,prev_endquote) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter) SELECT 1[N1], CASE WHEN SUBSTRING(@pString,1,1)='"' THEN 1 ELSE 0 END[this_startquote], 0[prev_endquote] UNION ALL SELECT t.N+1[N1] , CASE WHEN SUBSTRING(@pString,t.N+1,1)='"' THEN 1 ELSE 0 END[this_startquote]--this field starts with quote , CASE WHEN SUBSTRING(@pString,t.N-1,1)='"' THEN 1 ELSE 0 END[prev_endquote]--previous field ended with quote --both could be true for contiguous quoted fields: "1","2" FROM cteCommasNotWithinQuotes t )--cteStart --SELECT * FROM cteStart ,cteLen AS(--==== Return start and length (for use in substring) SELECT s.N1, s.this_startquote ,CASE WHEN ending_comma.N1 IS NOT NULL THEN ending_comma.N1-s.N1-1 ELSE LEN(@pString)-s.N1+1 END [L1] ,CASE WHEN ending_comma.prev_endquote IS NOT NULL THEN ending_comma.prev_endquote ELSE CASE WHEN RIGHT(@pString,1)='"' THEN 1 ELSE 0 END END[this_endquote] FROM cteStart s OUTER APPLY ( SELECT TOP 1 e.N1,e.prev_endquote FROM cteStart e WHERE e.N1>s.N1 ORDER BY N1 ) ending_comma )--cteLen --SELECT *,SUBSTRING(@pString,N1,L1)[substr] FROM cteLen SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY LL.N1), Item = SUBSTRING(@pString, LL.N1+LL.this_startquote, LL.L1-this_startquote-this_endquote) FROM cteLen LL ; GO
And here is a little tester code:
--Create a few hundred test cases by all combinations of A,B,C where each can be bare or "surrounded by quotes" DECLARE @a TABLE (x VARCHAR(200)) INSERT INTO @a VALUES (''),('a1'),('qa1,qa2'),(',xa,') DECLARE @q TABLE (iq TINYINT, q VARCHAR(1)) INSERT INTO @q VALUES (0,''),(1,'"')
IF object_id('tempdb..#T') IS NOT NULL DROP TABLE #T CREATE TABLE #T (id INT IDENTITY(1,1),s VARCHAR(200)) INSERT INTO #T SELECT QA.q+A.x+QA.q +','+QB.q+B.x+QB.q +','+QC.q+C.x+QC.q FROM @a A CROSS JOIN @q QA CROSS JOIN @a B CROSS JOIN @q QB CROSS JOIN @a C CROSS JOIN @q QC
--Run the test -- 512 input rows, 2112 output rows -- up to you to eyeball the results IF 1=0 SELECT id,s, r.* FROM #T OUTER APPLY dbo.parseCSV(s) r
--show fields in separate columns so more compact: SELECT s, [1][one],[2][two],[3][three],[4][four],[5][five],[6][six],[7][seven] FROM ( SELECT t.id, t.s, r.ItemNumber, r.Item FROM #T t OUTER APPLY dbo.parseCSV(s) r ) AS SRC PIVOT ( MAX(Item) FOR ItemNumber IN ([1],[2],[3],[4],[5],[6],[7]) ) AS pvt
IF 1=0 BEGIN--testa --prints as we go, so we can find where a problem happened: DECLARE @n INT=0 WHILE EXISTS(SELECT * FROM #T WHERE id>@n) BEGIN--n SELECT @n=MIN(id) FROM #T WHERE id>@n DECLARE @s1 VARCHAR(200) SELECT @s1=s FROM #T WHERE id=@n PRINT CONVERT(VARCHAR,@n)+': '+@s1 SELECT @n[n],* FROM dbo.parseCSV(@s1) v END--n END--testa
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Thursday, May 02, 2013 4:10 AM
Points: 2,
Visits: 7
|
|
Hi Jeff,
Can you please test your new functions' performance against the following function :
CREATE FUNCTION [dbo].[ParseString] ( @String VarChar(8000), @Delimiter VarChar(1) ) RETURNS TABLE AS
RETURN ( WITH Tokens(LinePos, StartPos, EndPos) AS ( SELECT 1, 1, CharIndex(@Delimiter, @String) UNION ALL SELECT LinePos + 1, EndPos + 1, CharIndex(@Delimiter, @String, EndPos + 1) FROM Tokens WHERE EndPos > 0 ) SELECT CONVERT(Int, LinePos) AS RowNumber, SubString ( @String, StartPos, CASE WHEN EndPos > 0 THEN (EndPos - StartPos) ELSE 8000 END ) AS StringValue from Tokens )
and let us know which performs best ?
Regards, Dirk van der Watt
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Yesterday @ 5:33 PM
Points: 32,902,
Visits: 26,783
|
|
Dirk vd Watt (12/28/2012)
Hi Jeff, Can you please test your new functions' performance against the following function : CREATE FUNCTION [dbo].[ParseString] ( @String VarChar(8000), @Delimiter VarChar(1) ) RETURNS TABLE AS
RETURN ( WITH Tokens(LinePos, StartPos, EndPos) AS ( SELECT 1, 1, CharIndex(@Delimiter, @String) UNION ALL SELECT LinePos + 1, EndPos + 1, CharIndex(@Delimiter, @String, EndPos + 1) FROM Tokens WHERE EndPos > 0 ) SELECT CONVERT(Int, LinePos) AS RowNumber, SubString ( @String, StartPos, CASE WHEN EndPos > 0 THEN (EndPos - StartPos) ELSE 8000 END ) AS StringValue from Tokens )
and let us know which performs best ? Regards, Dirk van der Watt
That's a recursive CTE that counts. Please see the following article. http://www.sqlservercentral.com/articles/T-SQL/74118/
Considering the findings in that article and the fact that I provided a complete test harness so that anyone could do a full test on their own functions, please understand when I say "Can you please test {your own} new functions' performance ... and let us know which performs best?" 
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Saturday, May 18, 2013 12:02 PM
Points: 17,
Visits: 502
|
|
what about xml?
declare @table table (id int identity(1,1), textdata varchar(4000)) insert @table exec xp_cmdshell 'wmic volume get capacity, drivetype, freespace, caption /format:csv'
; with cte as ( select Data.value('(/root/r/text())[1]','varchar(1000)') Node , convert(bigint,Data.value('(/root/r/text())[2]','varchar(1000)')) / (1024 * 1024 * 1024) Capacity , Data.value('(/root/r/text())[3]','varchar(1000)') Caption , Data.value('(/root/r/text())[4]','varchar(1000)') DriveType , convert(bigint,replace(Data.value('(/root/r/text())[5]','varchar(1000)'),char(10),'')) / (1024 * 1024 * 1024) FreeSpace from @table cross apply (select convert(xml,'<root><r>' + replace(textdata,',','</r><r>') + '</r></root>') as Data) textdataXML where Data.value('(/root/r/text())[4]','varchar(1000)') = '3' ) select Caption DriveName , Capacity , FreeSpace , convert(money,FreeSpace) / convert(money,Capacity) PercentFree from cte where Capacity > 0 order by 3 desc
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Yesterday @ 2:45 PM
Points: 8,567,
Visits: 8,218
|
|
|
|
|