Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Viewing 15 posts - 781 through 795 (of 991 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

Eirikur Eiriksson (10/28/2015)
Ed Wagner (10/28/2015)
allnelsons (10/28/2015)
So, question about these splitter functions (which are pretty much the most used tools in my arsenal, btw!), are they faster/more efficient being run from the I.B.G. CTE tally table like you're doing or with a physical Numbers table? Anyone done that analysis?
It depends on the situation. I've seen smaller sets (< 8K rows) run faster with a physical tally table than with the tally function.
It does also depend on the construct of the inline talky table, the numbers of rows produced and few other factors, best results so far are using memory resident tally table.In some cases, normal tally patterns like 2^3^3 or 2^2^2^2 tend to be up to 50% slower than 465^3 or 15^2^3.
😎

Thank you, Eirikur. You've given me something new to investigate. 😉

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 2

Ed Wagner (10/28/2015)
Eirikur Eiriksson (10/28/2015)
Ed Wagner (10/28/2015)
allnelsons (10/28/2015)
So, question about these splitter functions (which are pretty much the most used tools in my arsenal, btw!), are they faster/more efficient being run from the I.B.G. CTE tally table like you're doing or with a physical Numbers table? Anyone done that analysis?
It depends on the situation. I've seen smaller sets (< 8K rows) run faster with a physical tally table than with the tally function.
It does also depend on the construct of the inline talky table, the numbers of rows produced and few other factors, best results so far are using memory resident tally table.In some cases, normal tally patterns like 2^3^3 or 2^2^2^2 tend to be up to 50% slower than 465^3 or 15^2^3.
😎
Thank you, Eirikur. You've given me something new to investigate. 😉

Been spending quite some time on this subject, any input appreciated, especially on realistic testing!

😎

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 3

I think that if we're going to do a study on inline Tally Table creation methods and comparisons against physical Tally Tables that have been cached, we should start another thread to possibly wet the appetite of additional participants.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Steve Jones - SSC Editor SSC Guru Points: 741485 More actions · Answer 4

Or someone should write an article looking at performance comparisons.

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 5

Steve Jones - SSC Editor (11/2/2015)
Or someone should write an article looking at performance comparisons.

Heh... hint taken. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 6

Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!

https://msdn.microsoft.com/en-us/library/mt684588.aspx

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Ed Wagner SSC Guru Points: 287026 More actions · Answer 7

TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx

I heard about this and must say that it's about darn time. I find it interesting that they could add this function but still not add a table of numbers (table/ITVF/something). One step at a time, I guess.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 8

TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx

Ed Wagner (3/11/2016)
TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx
I heard about this and must say that it's about darn time. I find it interesting that they could add this function but still not add a table of numbers (table/ITVF/something). One step at a time, I guess.

Wayne Sheffield is actually doing that as we speak.

There are two major problems with the new function though and they both have connect items on them (links courtesy of Wayne Sheffield).

Connect items to fix this:

https://connect.microsoft.com/SQLServer/Feedback/Details/2433171 (add Item #)

https://connect.microsoft.com/SQLServer/Feedback/Details/2454210 (handle NULL inputs)

With that, I strongly encourage everyone to visit those links and vote the fixes up!

@steve-2 Jones. This would be really good for an editorial especially if you included the links and encouraged folks to vote the fixes up. If you want, I could write it up over this weekend and send it to you.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sean Lange SSC Guru Points: 286609 More actions · Answer 9

Jeff Moden (3/11/2016)
TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx
Ed Wagner (3/11/2016)
TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx
I heard about this and must say that it's about darn time. I find it interesting that they could add this function but still not add a table of numbers (table/ITVF/something). One step at a time, I guess.
Wayne Sheffield is actually doing that as we speak.
There are two major problems with the new function though and they both have connect items on them (links courtesy of Wayne Sheffield).
Connect items to fix this:
https://connect.microsoft.com/SQLServer/Feedback/Details/2433171 (add Item #)
https://connect.microsoft.com/SQLServer/Feedback/Details/2454210 (handle NULL inputs)
With that, I strongly encourage everyone to visit those links and vote the fixes up!
@steve-2 Jones. This would be really good for an editorial especially if you included the links and encouraged folks to vote the fixes up. If you want, I could write it up over this weekend and send it to you.

The item number is the main reason that the DelimitedSplit8K is far superior to all the other splitters available. Several other splitters are ok for performance on a smallish number of rows. But the lack of ordinal position within the list just sucks.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

TomThomson SSC Guru Points: 104768 More actions · Answer 10

Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.

Tom

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 11

TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx

Someone needs to point out that using this in a WHERE clause is a BAD thing too (or putting in a table variable). The lack of statistics can lead to HORRIBLE plan choices if any data value skew is in play. Also there are any number of times the size of the thingy coming out will lead to issues too I bet. Oh, and implicit conversions possibly too.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 12

TomThomson (3/11/2016)
Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.

If you're suggesting that no rows should be returned if you pass the function a NULL, I have to disagree for the same reason that I think the following produces the correct result...

SELECT LOG(NULL);

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004706 More actions · Answer 13

TheSQLGuru (3/11/2016)
TheSQLGuru (3/11/2016)
Someone needs to benchmark the STRING_SPLIT() function coming out in SQL Server 2016 and update the main article!
https://msdn.microsoft.com/en-us/library/mt684588.aspx
Someone needs to point out that using this in a WHERE clause is a BAD thing too (or putting in a table variable). The lack of statistics can lead to HORRIBLE plan choices if any data value skew is in play. Also there are any number of times the size of the thingy coming out will lead to issues too I bet. Oh, and implicit conversions possibly too.

Heh... kind of like how MS pointed out that FORMAT is more than 44 times slower than an equivalent CONVERT? It won't happen in MS documentation. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Ed Wagner SSC Guru Points: 287026 More actions · Answer 14

Jeff Moden (3/11/2016)
TomThomson (3/11/2016)
Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.
If you're suggesting that no rows should be returned if you pass the function a NULL, I have to disagree for the same reason that I think the following produces the correct result...
SELECT LOG(NULL);

I don't think any splitter would ever return a NULL as one of the results because you can't concatenate a NULL with a string and get anything other than NULL,

For example, "1,2,,4" would hopefully return 4 rows: 1, 2, an empty string, 4. This is because there really is an empty string in position 3. And as we all know, an empty string is not a NULL.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

TomThomson SSC Guru Points: 104768 More actions · Answer 15

Jeff Moden (3/11/2016)
TomThomson (3/11/2016)
Certainly we want to get the numbering added to this splitter in SQL 2016, but we ought to find way of avoiding treating NULL as a value rather than as an indicator that we don't have a value. If the splitter were a TVF this would be easy: NULL return means the function was asked to split something that isn't a value, so it can't return any value and must return null (and empty table returned means no rows if we need that result); or maybe an extra parameter indicating what to do with NULL as someone else suggested.
If you're suggesting that no rows should be returned if you pass the function a NULL, I have to disagree for the same reason that I think the following produces the correct result...
SELECT LOG(NULL);

No, that's not what I'm suggesting. An empty set for NULL is anathema.

What I'm suggesting is probably pretty radical: since Codd's idea was that we should work in a language that was based on a relational calculus, let's make T-SQL a little bit closer to that and treat relations as first class objects in the language - I know, absolute anathema to the SQL Standards gurus, but in my view disagreeing with them is a probably a good sign.

So if you pass the function a NULL, it returns NULL, and if you pass it aomething other than null it retums a relation (ie a set of rows) - it's a (built in) TVF. Whether the function ever returns a relation containing no rows depends on how it is specified to behave if the string passed in to be split has length zero. To me it seems natural in that case to return one row (sequence number 1, string value the zero length string) but some people think returning an empty relation for that case would be better.

Tom