Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««4546474849»»»

Tally OH! An Improved SQL 8K “CSV Splitter” Function Expand / Collapse
Author
Message
Posted Saturday, December 29, 2012 6:57 PM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, March 25, 2014 2:33 PM
Points: 68, Visits: 60
Hi
unbelievable - so good. I have been doing these puter things for 30 years now and i am still amazed how little i know or probably at just how clever other people are. Really well explained and totally useful.

keep up the good work.

APL. My initials not the language.
Post #1401221
Posted Sunday, December 30, 2012 10:02 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 10:00 AM
Points: 8,551, Visits: 9,043
Jeff Moden (12/28/2012)

If you look at the execution plan, it doesn't actually use the "same list". CTEs have some wonderful advantages but they also have a hidden fault. Instead of using the same internal result set from a CTE when things like self joins are used, the entire CTE is re-executed.

Now, if only one could force the optimiser to spool the CTE and reuse it, CTEs would be far more useful. Would they be useful enough to deliver a performance gain here? Maybe, and maybe not.

Because there are cases where a CTE might be so big that spooling it would damage performance rather than improve it, MS decided never to spool them - at least in the current implementation - perhaps they spool when spooling makes a semantic difference, not just a performance one, but I'm not sure that they spool even then, as far as I can tell it isn't documeneted anywhere. Why can't the optimiser look and see if spooling would enhance performance (only in the cases where it make no semantic difference, of course) and use it when it does? Or if that's too difficult a task for the optimiser (it shouldn't be, but they haven't done it so I can imagine them claiming it is) why can't we have a query hint that tells them to do it?

I hope Paul or someone will jump in and explain all this - maybe tell me I've got it all wrong, but that's OK too, I like learning.


Tom
Post #1401259
Posted Sunday, December 30, 2012 12:48 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
mister.magoo (12/29/2012)
Steven Willis (12/28/2012)

Here's Mr. Magoo's method for splitting element pairs:



Gosh, I hope that was a mistake?
That code was not the best version of the XML split for that problem by a long way

I won't clutter up this thread with the better version - it is in the other thread you linked to, and as Jeff has pointed out, it was for a specific problem, not a generic splitter.

Otherwise, thanks for the nod

Probably a mistake on my part then...the code for the XML Split version I tested is in the code I attached in the other thread referenced above. But as a splitter of delimited value pairs your XML version performed very well when compared to other non-CLR methods.

 
 
Post #1401266
Posted Sunday, December 30, 2012 1:07 PM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 3, 2014 6:35 PM
Points: 3, Visits: 68
L' Eomot Inversé (12/30/2012)
Jeff Moden (12/28/2012)

If you look at the execution plan, it doesn't actually use the "same list". CTEs have some wonderful advantages but they also have a hidden fault. Instead of using the same internal result set from a CTE when things like self joins are used, the entire CTE is re-executed.

Now, if only one could force the optimiser to spool the CTE and reuse it, CTEs would be far more useful.

Don't become so blinded by CTEs that you forget about temporary tables or table-type variables; another problem with CTEs is they are not indexed.

On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version...
Post #1401268
Posted Sunday, December 30, 2012 1:13 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 7:58 AM
Points: 23,004, Visits: 31,496
John Hardin (12/30/2012)
L' Eomot Inversé (12/30/2012)
Jeff Moden (12/28/2012)

If you look at the execution plan, it doesn't actually use the "same list". CTEs have some wonderful advantages but they also have a hidden fault. Instead of using the same internal result set from a CTE when things like self joins are used, the entire CTE is re-executed.

Now, if only one could force the optimiser to spool the CTE and reuse it, CTEs would be far more useful.

Don't become so blinded by CTEs that you forget about temporary tables or table-type variables; another problem with CTEs is they are not indexed.

On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version...


Page 27 passed on how many posts per page? I ask as I display 50 posts per page. It would help if you posted the url of your post so we could go directly to that post.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1401269
Posted Sunday, December 30, 2012 2:28 PM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 3, 2014 6:35 PM
Points: 3, Visits: 68
Lynn Pettis (12/30/2012)
John Hardin (12/30/2012)
L' Eomot Inversé (12/30/2012)
Jeff Moden (12/28/2012)

If you look at the execution plan, it doesn't actually use the "same list". CTEs have some wonderful advantages but they also have a hidden fault. Instead of using the same internal result set from a CTE when things like self joins are used, the entire CTE is re-executed.

Now, if only one could force the optimiser to spool the CTE and reuse it, CTEs would be far more useful.

Don't become so blinded by CTEs that you forget about temporary tables or table-type variables; another problem with CTEs is they are not indexed.

On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version...


Page 27 passed on how many posts per page? I ask as I display 50 posts per page. It would help if you posted the url of your post so we could go directly to that post.

D'oh! I didn't consider non-default posts-per-page settings. Sorry. Here's the direct link: http://www.sqlservercentral.com/Forums/FindPost1129283.aspx

The second code block is what I am referring to.
Post #1401270
Posted Sunday, December 30, 2012 5:58 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 6:13 AM
Points: 11,192, Visits: 11,092
L' Eomot Inversé (12/30/2012)
Why can't the optimiser look and see if spooling would enhance performance (only in the cases where it make no semantic difference, of course) and use it when it does? Or if that's too difficult a task for the optimiser (it shouldn't be, but they haven't done it so I can imagine them claiming it is) why can't we have a query hint that tells them to do it? I hope Paul or someone will jump in and explain all this - maybe tell me I've got it all wrong, but that's OK too, I like learning.

You're right that the optimizer doesn't consider CTE materialization today (they are strictly in-line view definitions, expanded once per reference) and there is no hint for materialization either.

There is a 2006 active Connect suggestion by Adam Machanic for a hint. Though some duplicates of that item have been closed as Won't Fix (= perhaps for the time being) my overall impression from public Microsoft comments is that something along these lines is being actively considered for a future release. We will have to wait and see.

Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1401276
Posted Sunday, December 30, 2012 10:23 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
Tony.l (12/29/2012)
Hi
unbelievable - so good. I have been doing these puter things for 30 years now and i am still amazed how little i know or probably at just how clever other people are. Really well explained and totally useful.

keep up the good work.

APL. My initials not the language.


Thanks for the great feedback, Tony. I feel the same way. I've been working with computers in one form or another since 1968 and I'm still learning something new about them everyday.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1401288
Posted Sunday, December 30, 2012 10:48 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
John Hardin (12/30/2012)
On page 27 I posted a suggested version of this that avoided CHARINDEX() and used an indexed table-type variable and it showed performance gains over the pure-CTE version, but I don't think anyone has seen fit to comment on that suggested version...


You'll have to forgive me a bit on that. If you look at even the most recent pages of this thread, there are still people saying "Try this" or "How about this"? Even you said...

How about this to get rid of the CHARINDEX() string operation:
{snip}
...how does that affect performance?


Two days later (lots of water had flowed under the bridge by then and we're not notified by edits), you added...

edited to add: Actually performance goes down a bit because the subquery in making cteStartEnd is not indexed. This version does exhibit performance gains in my very modest testing:


...and followed that up with...
(Note: I'm doing my version as a SP so the above exact code hasn't actually been tested by me...)


I just didn't (and still don't) have the time to test everyone's suggestion. That's why I spent so much time on building a test harness for everyone to use.

If you think your method has merit in the area of performance, please test it and post the test results. Who knows? You might be on to something.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1401289
Posted Monday, December 31, 2012 10:24 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: 2 days ago @ 10:00 AM
Points: 8,551, Visits: 9,043
SQL Kiwi (12/30/2012)
There is a 2006 active Connect suggestion by Adam Machanic for a hint. Though some duplicates of that item have been closed as Won't Fix (= perhaps for the time being) my overall impression from public Microsoft comments is that something along these lines is being actively considered for a future release. We will have to wait and see.

Well, I've added my vote to that. It will probably make no difference, but that isn't an excuse for not doing so whem I think it's right.

Related: more generalized subtree matching and reuse has been trialled in the product, see this Microsoft Research paper (PDF).

For some reason the link doesn't work. But looking in the obvious place indicated by your link (http://research.microsoft.com/en-us/um/people/jrzhou/publications.htm) finds it and the link from there, although it appears to be identical to your link, works fine. I have no idea what's happening there.
Edit: yes, I see what was happening: your link is relative, not absolute, so it tries to connect to www.sqlservercentral.com/Forums/research.microsoft.com/en-us/um/people/jrzhou/pub/cse.pdf instead of to the real target. I suspect I've done that sometimes too - probably every time I've typed the link instead of copyying and pasting from the address field of my browser.


Tom
Post #1401435
« Prev Topic | Next Topic »

Add to briefcase «««4546474849»»»

Permissions Expand / Collapse