Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««2425262728»»»

Tally OH! An Improved SQL 8K “CSV Splitter” Function Expand / Collapse
Author
Message
Posted Sunday, May 15, 2011 3:16 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 7:42 PM
Points: 8,567, Visits: 9,071
mark hutchinson (5/15/2011)
SQLkiwi (5/14/2011)
[quote]peter-757102 (5/14/2011)
There was some chatter a while back about replacing T-SQL entirely. This would be a good thing, in my view, though it does rather depend on what it was to be replaced with


Let the horrors begin...
* ErLang
* F#
* APL

Well, I agree that APL is probably a horror in this context; but the other two? If they count as horrors, then surely so must C#, and things like VBS must count as more horrid than horrors.


Tom
Post #1109004
Posted Sunday, May 15, 2011 3:46 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:56 PM
Points: 36,775, Visits: 31,230
Tom.Thomson (5/15/2011)
Well, I agree that APL is probably a horror in this context; but the other two? If they count as horrors, then surely so must C#, and things like VBS must count as more horrid than horrors.


I absolutely agree that the experiment known as C# was a mistake...


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1109012
Posted Sunday, May 15, 2011 3:58 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 2:50 PM
Points: 40, Visits: 343
As a VB guy, I'm pissed that the VB team in Microsoft's .Net team lost every decision battle to the C# team.

Thank you anyway, Anders, for Delphi. Maybe the VB-vs-C# decisions were just your revenge.



Post #1109019
Posted Sunday, May 15, 2011 4:22 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:56 PM
Points: 36,775, Visits: 31,230
Tom.Thomson (5/15/2011)
Jeff Moden (5/14/2011)
It would be very interesting, indeed, if there were a true relational database that some of the people touting the attributes of relational databases could try. I believe the majority of them wouldn't be disappointed while people like me would absolutely hate it because I tend do things with data that probably couldn't be done in a true relational database.

I'd be surprised if you did anything that couldn't be done in a true relational database - but not at all surprised if some of the people out there claiming to know what the one true relational model is produced a database that prevented you from doing them. If you were to get a real relational language I think you would probably like it (and those I sometimes call the "relational fundamentalists" would probably hate it).

So far as T-SQL and SQL in general go... yes, I agree it has some shortcomings but I'm definitely in love with the idea of declarative functionality rather than procedural flexibility although, as some have already stated, a proper mix of the two would be much more useful.

Well, if you want declarative functionality you certainly don't want SQL! Declarative systems people sometimes quarrel with SQL people because SQL people keep on claiming it is a declarative language, which it most certainly is not. Would you adopt the idea that no dml statement can modify the visible data in relations (all it can do is prepare modifications to be carried out by a subsequent commit statement or discarded by rollback) - once you let the modifications proposed by one dml statement be visible to a subsequent dml statement in the same transaction you have a state-oriented procedural dml not a declarative one. This is very hard indeed to handle in a language that doesn't allow higher order functions and general composition of statement components to form more complex statements, so you would end up with something not much like SQL (SQL has taken on board some of that language style with its fairly general use of subqueries and of CTEs, but not enough of it).
With a decent declarative functional language for writing the bits we might do in CLR for SQL, there would be no need for a procedural language for those bits. I don't know whether F# will become such a language, but I suspect Haskell would be a better choice anyway.
And we will always need some non-declarative bits: commitment can be handled as at present, with transaction starts and commits, maybe also with autocommit (single statement transactions). Writing to files (other than database files) with immediate effect (eg for activities needing synchronous logging, such as some debugging) and I guess anything else needed could be handled using monads in the auxiliary language. So there's probably nothing procedural needed at all at database level, and very little that's not declarative. Of course under the covers things like concurrency control and recovery logging and so on will be happening, and these are probably not going to be declarative and may be procedural.


Thanks, Tom. I can always count on you for wise words.

I started to write about what I think of some of the relational zealots and fundamentalists but some truly unkind words became the main feature of the post and so I decided to practice what most of our Mom's taught us... "If you don't have something nice to say, don't say it."


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1109023
Posted Monday, May 16, 2011 5:59 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:18 AM
Points: 323, Visits: 2,200
I like to add my idea that it is a mistake to even strive for a fully declarative/functional language in the first place. SQL can be difficult to handle at times as one change in what you want might mean a redesign of your whole query. This property is not something to strive for endlessly, not everything should be written down as an endless formula that is impossible to comprehend in the end.

We got to ask ourself: what is the benefit of a functional language over a procedural one?

My take in this is the ability of the functional language to make an implementation where lots of details are taken care off for you (possibly efficiently). Things you otherwise need to deal with yourself, providing a kind of runtime framework if you will. So as long as you can easily work on a functional level and as human can comprehend the functional notation and easily change it to do something slightly different you want, your good with sticking to functional.

But there comes a point (and this is different for everyone) where you have to break things down as these limits are reached, and that is the end of the line for sticking to pure functional. This means that in reality there is no true workable functional solution possible as you will always run into this limit sooner or later Only hybrid solutions can survive in the wild and have meaning, hybrids like SQL. And hybridization should be completed in terms of running procedural functionality outside the runtime framework of SQL itself. In other works, you should be able to add new functions, possible with some meta information (is deterministic, returns null on any null input, etc) that helps the optimizer make better use of them when used as part of an SQL statement. It would also be nice if some runtime environment properties were communicated, such as certain set options that can affect functionality.

Anyway, I am getting sucked into a whole new discussion here, just wanted to say my thing and get some rest in my head now :).
Post #1109237
Posted Wednesday, May 25, 2011 9:01 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 8:54 AM
Points: 106, Visits: 363
Thanks for the great article and the code that goes with it.

Is there a reason why you chose a VARCHAR(8000) instead of a VARCHAR(MAX)? I modified the provided function to take a (MAX) as a parameter and it seems to work. To account for the increase in size, it is using an existing Tally table rather than the CTE, even though the tested CTE version outperforms the table-based version. That seemed simpler than adding more layers to the compounding CTEs, and the benchmarking was based on generating 10K rows rather than 100K or 1,000K rows.

Do you see any reason why a VARCHAR(MAX) would not work? I'm still testing it, but your thoughts would be appreciated.
Post #1114800
Posted Wednesday, May 25, 2011 10:02 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 3:04 PM
Points: 6,582, Visits: 8,861
fahey.jonathan (5/25/2011)
Is there a reason why you chose a VARCHAR(8000) instead of a VARCHAR(MAX)?


Performance. It goes wacko as soon as you start sending in > 8000 characters to the function. (However, in my limited testing, the varchar(max) seems to work fine with <= 8000 characters being sent to it.)

A DelimitedSplitMax version is in the works, but it's not ready for prime time yet.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1114871
Posted Wednesday, May 25, 2011 10:24 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:56 PM
Points: 36,775, Visits: 31,230
fahey.jonathan (5/25/2011)
Thanks for the great article and the code that goes with it.

Is there a reason why you chose a VARCHAR(8000) instead of a VARCHAR(MAX)? I modified the provided function to take a (MAX) as a parameter and it seems to work. To account for the increase in size, it is using an existing Tally table rather than the CTE, even though the tested CTE version outperforms the table-based version. That seemed simpler than adding more layers to the compounding CTEs, and the benchmarking was based on generating 10K rows rather than 100K or 1,000K rows.

Do you see any reason why a VARCHAR(MAX) would not work? I'm still testing it, but your thoughts would be appreciated.


Yes. VARCHAR(MAX) doesn't like to be joined to. In the testing that I did, just changing the input to VARCHAR(MAX) and still using it for something less than VARCHAR(8000) immediately caused a 2:1 slowdown. Like Wayne said, a VARCHAR(MAX) splitter is in the works using a similar method but it's just not ready yet. Of course, now that you've said what you've said, some retesting for VARCHAR(MAX) appears to be in order across different machines.

If you can use a CLR like the one in the test attachment to the article, that would be the best thing to do for splitting. Understood if you cannot.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1114894
Posted Wednesday, May 25, 2011 11:53 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 8:54 AM
Points: 106, Visits: 363
I've tested it using 36014 characters (list of email addresses) and the response time is under 1 second:
-----------------------
2011-05-25 12:46:54.640
2011-05-25 12:46:55.253

For the few times that I will be using this, that speed is acceptable. I wonder if using a table-based Tally table makes any difference to the speed given the larger number of records that need to be generated using the CTE version. I did not try using the CTE version scaled up to (MAX) size.

I'm eager to see your revised version when it is complete.
Post #1114961
Posted Wednesday, May 25, 2011 12:11 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, July 9, 2014 2:50 PM
Points: 40, Visits: 343
@Jeff

Here's an idea to kick around for varchar(max) data...use a tally table (or cte equivalent) to split the very large string into strings less than 8000 characters and then rejoin those parsed strings.

I have no idea if it can or how it would be implemented, but it was an out-of-the-box idea that just struck me. Thought I would share the idea in case I get struck by a bus.

Mark



Post #1114972
« Prev Topic | Next Topic »

Add to briefcase «««2425262728»»»

Permissions Expand / Collapse