Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««2324252627»»»

Tally OH! An Improved SQL 8K “CSV Splitter” Function Expand / Collapse
Author
Message
Posted Friday, May 13, 2011 4:03 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 7:46 AM
Points: 328, Visits: 2,234
Tom.Thomson (5/13/2011)
erroneous post - content deleted
You make me curious ;)
Post #1108748
Posted Saturday, May 14, 2011 7:57 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 7:49 AM
Points: 8,823, Visits: 9,383
peter-757102 (5/13/2011)
Tom.Thomson (5/13/2011)
erroneous post - content deleted
You make me curious ;)

I was thinking of trying with a quick approximation for sqrt which doesn't need any real numbers, but as soon as I posted the suggestion I realised that this is one of those cases where a quick and dirty approximation will hinder rather than help.


Tom
Post #1108816
Posted Saturday, May 14, 2011 9:19 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 7:46 AM
Points: 328, Visits: 2,234
Tom.Thomson (5/14/2011)
peter-757102 (5/13/2011)
Tom.Thomson (5/13/2011)
erroneous post - content deleted
You make me curious ;)

I was thinking of trying with a quick approximation for sqrt which doesn't need any real numbers, but as soon as I posted the suggestion I realised that this is one of those cases where a quick and dirty approximation will hinder rather than help.


Yeah, I mentioned the result of my tests with that a few posts back. It did reduce reported memory consumption, but i am not sure the reported memory is actually claimed...i need to develop some test for that really.

In the meantime I did come up with a new function that uses nearly no memory and works a quite a bit faster for strings with ** very few ** elements. The trouble is, the number of few elements it can handle faster is too limited (break even at 5 elements), one has to wonder if it makes sense to have such a over specialized function. In theory the function should be incredibly fast, I have yet to find what exactly causes it to under perform to my expectations. lets hope I do find the cause as scaling it up to 16 characters or so would make it useful.

The function quickly determines the first few separator positions using nothing but the minimum number of charindex function calls. So on a string with two separators (3 elements) it requires just 3 function calls and creates no substrings to test individually. It also works fully with collations like any other SQL code would and does not even need to know the length of the input string. The old method needs to know that and also inspect every character of the string plus perform a a charindex for each element detected. You could view this as an in-line loop unrolled version of a recursive CTE without the overhead and with a limit of how many elements will be recognized.

Another avenue I am going to test in an attempt to reduce function call overhead is to modify the tally CTE in such a way, that it skips all characters before the first separator (based on a single charindex call). I am just hoping at this point it wont run into similar issues as the above mentioned version and suddenly becomes slower for the majority of cases.

But life ain't fair in T-SQL land and I wish SQL Server had some better build-in functional extensibility as it does now. While CLR certainly works on a technical level, it is far from a perfect merger with SQL Server. Ideal would be to have basic programming capabilities comparable to that of CLR, but a more domain specific interface to T-SQL, and build into SQL Server itself completely. So no need for an external compiler and with an interface that makes streaming data the norm rather then something you have to work hard towards yourself. There are plenty of script languages that compile to well performing code on the fly, why can't we have one in SQL Server for this purpose is what I wonder. Maybe Microsoft is blinded and only wants to promote the CLR as the ideal solution for everything...which it obviously cannot be.

Here the experimental code so far for the small number of elements splitter, maybe someone else has an idea what causes it do perform less then can be expected:

create FUNCTION dbo.DelimitedSplit8K_T2( @pString VARCHAR(8000), @pDelimiter CHAR(1) )
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
select
ItemNumber = r.r
, Item = substring( @pString, r.s, isnull( nullif( r.n, 0 ), 8001 ) - r.s )
from
( select 1, charindex( @pDelimiter, @pString, 1 ) ) s1( s, n )
outer apply( select top( isnull( sign( s1.n ), 0 ) ) s1.n + 1, charindex( @pDelimiter, @pString, s1.n + 1 ) ) s2( s, n )
outer apply( select top( isnull( sign( s2.n ), 0 ) ) s2.n + 1, charindex( @pDelimiter, @pString, s2.n + 1 ) ) s3( s, n )
outer apply( select top( isnull( sign( s3.n ), 0 ) ) s3.n + 1, charindex( @pDelimiter, @pString, s3.n + 1 ) ) s4( s, n )

cross apply
(
select 1, s1.s, s1.n
union all select top( isnull( sign( s1.n ), 0 ) ) 2, s2.s, s2.n
union all select top( isnull( sign( s2.n ), 0 ) ) 3, s3.s, s3.n
union all select top( isnull( sign( s3.n ), 0 ) ) 4, s4.s, s4.n

) as r( r, s, n )
;
go

A more streamlined version that is faster and has room to be extended to 6 elements (at the expense of loosing some of its speed gain):

create FUNCTION dbo.DelimitedSplit8K_T2( @pString VARCHAR(8000), @pDelimiter CHAR(1) )
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
select
ItemNumber = r.r
, Item = substring( @pString, r.s + 1, isnull( nullif( r.n, 0 ), 8001 ) - r.s - 1 )
from
( select 0, charindex( @pDelimiter, @pString, 1 ) ) s1( s, n )
outer apply( select top ( isnull( s1.n, 0 ) ) s1.n, charindex( @pDelimiter, @pString, s1.n + 1 ) ) s2( s, n )
outer apply( select top ( isnull( s2.n, 0 ) ) s2.n, charindex( @pDelimiter, @pString, s2.n + 1 ) ) s3( s, n )
outer apply( select top ( isnull( s3.n, 0 ) ) s3.n, charindex( @pDelimiter, @pString, s3.n + 1 ) ) s4( s, n )

cross apply
(
select 1, s1.s, s1.n
union all select 2, s2.s, s2.n
union all select 3, s3.s, s3.n
union all select 4, s4.s, s4.n

) as r( r, s, n )
where
r.s is not null
;
go

Other then that, I give up, its not going to work any better then this!
Post #1108820
Posted Saturday, May 14, 2011 3:50 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Saturday, September 13, 2014 8:01 AM
Points: 11,194, Visits: 11,165
peter-757102 (5/14/2011)
While CLR certainly works on a technical level, it is far from a perfect merger with SQL Server. Ideal would be to have basic programming capabilities comparable to that of CLR, but a more domain specific interface to T-SQL, and build into SQL Server itself completely. So no need for an external compiler and with an interface that makes streaming data the norm rather then something you have to work hard towards yourself.

I agree with this sentiment, to a large extent.

I would certainly like to see CLR code compiled and maintained within SQL Server, probably with a T-SQL extension. It is a popular request, and there is some evidence that MSFT will in fact provide this feature:

http://connect.microsoft.com/SQLServer/feedback/details/265266/add-server-side-compilation-ability-to-sql-clr

I have to say, though, that I think T-SQL is horribly broken. There was some chatter a while back about replacing T-SQL entirely. This would be a good thing, in my view, though it does rather depend on what it was to be replaced with

I could go on about how SQL itself is broken (a properly relational database would be nice!) but that's probably a little too far off topic for this thread, and probably a little too much to hope for in any case.




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1108849
Posted Saturday, May 14, 2011 9:34 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:03 AM
Points: 37,065, Visits: 31,628
It would be very interesting, indeed, if there were a true relational database that some of the people touting the attributes of relational databases could try. I believe the majority of them wouldn't be disappointed while people like me would absolutely hate it because I tend do things with data that probably couldn't be done in a true relational database.

So far as T-SQL and SQL in general go... yes, I agree it has some shortcomings but I'm definitely in love with the idea of declarative functionality rather than procedural flexibility although, as some have already stated, a proper mix of the two would be much more useful.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1108863
Posted Sunday, May 15, 2011 5:56 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Saturday, September 13, 2014 1:46 PM
Points: 41, Visits: 370
SQLkiwi (5/14/2011)
[quote]peter-757102 (5/14/2011)
There was some chatter a while back about replacing T-SQL entirely. This would be a good thing, in my view, though it does rather depend on what it was to be replaced with


Let the horrors begin...
* ErLang
* F#
* APL



Post #1108892
Posted Sunday, May 15, 2011 10:03 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 7:46 AM
Points: 328, Visits: 2,234
I wouldn't dare to claim T-SQL is broken as for the most part it does a good job and to my satisfaction. The more recent added features (recursive CTE, output clause, to name a few) all have serious quirks that make adopting them non-transparent and often impossible to use. Even if your code is correct, it might not work as it completely depends on other features/settings used within the runtime environment, outside of your control. This makes these features pretty much non-existent and therefore represent a waste of time on Microsoft's part. This has been going on for like the last two major versions now, a very bad sign if you ask me and I worry about the quality and focus of the SQL Server team at Microsoft now. They don't seem to be in control of their product in the way they should.

Also, during my tests on the _T2 version for limited number of elements I ran into a disturbing issue that caused me to give up in frustration. The optimizer did a nice job and recognized it could remove all cross joins and set operations and reduce it all to a constant scan (short for some procedural logic). The resulting logic however was full of redundant calls to charindex, and sadly it did do nothing to remove this redundancy. If it would have, the result would be blazingly fast, optimal code. It could be that it thought a call to charindex does not cost anything at all or it simply lacks the ability to de-duplicate operations at the expression level.

The logic it reduced the critical part of the statement too must have looked like this:

    select 
s1 = 0
, n1 = charindex( @pDelimiter, @pString, 1 )
, s2 = nullif( charindex( @pDelimiter, @pString, 1 ), 0 )
, n2 = charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, 1 ), 0 ) + 1 )
, s3 = nullif( charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, 1 ), 0 ) + 1 ), 0 )
, n3 = charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, 1 ), 0 ) + 1 ), 0 ) + 1 )
, s4 = nullif( charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, 1 ), 0 ) + 1 ), 0 ) + 1 ), 0 )
, n4 = charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, nullif( charindex( @pDelimiter, @pString, 1 ), 0 ) + 1 ), 0 ) + 1 ), 0 ) + 1 )

In short, I think many of the problems I still do have with current T-SQL are more a result of a sloppy, not well thought out implementation. A thus not a result of some fundamental issues. Completely replacing everything seems overkill to me and just will open a new can of worms we all have to learn to live and work with.
Post #1108926
Posted Sunday, May 15, 2011 12:10 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:03 AM
Points: 37,065, Visits: 31,628
peter-757102 (5/15/2011)

...I ran into a disturbing issue that caused me to give up in frustration.


Time for some beer popsicles and a long chat with the dust bunnies.

In short, I think many of the problems I still do have with current T-SQL are more a result of a sloppy, not well thought out implementation. A thus not a result of some fundamental issues. Completely replacing everything seems overkill to me and just will open a new can of worms we all have to learn to live and work with.


I'm right there with you. My stand is that if anyone tries to replace T-SQL with some other nonsense, I'll be donning my postal uniform and making a road-trip to Redmond with the biggest pork-chop launcher I can build. If folks don't like T-SQL, let them use something else but leave T-SQL the hell alone. Same goes for the availability of NULLs.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1108960
Posted Sunday, May 15, 2011 1:50 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 7:49 AM
Points: 8,823, Visits: 9,383
SQLkiwi (5/14/2011)
peter-757102 (5/14/2011)
While CLR certainly works on a technical level, it is far from a perfect merger with SQL Server. Ideal would be to have basic programming capabilities comparable to that of CLR, but a more domain specific interface to T-SQL, and build into SQL Server itself completely. So no need for an external compiler and with an interface that makes streaming data the norm rather then something you have to work hard towards yourself.

I agree with this sentiment, to a large extent.

I would certainly like to see CLR code compiled and maintained within SQL Server, probably with a T-SQL extension. It is a popular request, and there is some evidence that MSFT will in fact provide this feature:

http://connect.microsoft.com/SQLServer/feedback/details/265266/add-server-side-compilation-ability-to-sql-clr

Thanks for the pointer. That item now has another vote.

I have to say, though, that I think T-SQL is horribly broken. There was some chatter a while back about replacing T-SQL entirely. This would be a good thing, in my view, though it does rather depend on what it was to be replaced with

I could go on about how SQL itself is broken (a properly relational database would be nice!) but that's probably a little too far off topic for this thread, and probably a little too much to hope for in any case.

Yes, it probably is too much to hope for, because (a) the relational fundamentalist extremists would be certain to wreck any attempt to do something sensible: they would insist that outer union and outer join be banned, along with any sort of NULL, and that for every statement that delivered some rows as its result the writer of the statement would have to specify the primary key of that result, and probably a lot of other rather silly requirements too and (b) the standards bodies would cave in to vested interests and besides making all sorts of stupidity mandatory would leave out of the standard all sorts of useful things because someone thought including them would advantage someone else (I can just see the SQL die-hards demanding that the relational language not implement natural equijoin, for example, so that we would still have to write it as a projection from the column-duplicating inner equijoin that SQL supports; and the "select * is a crime" nuts would of course puke at the very concept of natural join, wouldn't they).
I suppose that sounds cynical - but my view is the result of observing the standardisation processes for data communications protocols, for SQL, and for some other languages over a long period.

But SQL really is broken (T-SQL is overall no better, and the standardistas are voting for connect items to bring it down to to the level of the SQL standard in at least one place where it is currently better than the standard) and it really would be nice to have a real relational database system and an appropriate language for it.


Tom
Post #1108979
Posted Sunday, May 15, 2011 3:11 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 7:49 AM
Points: 8,823, Visits: 9,383
Jeff Moden (5/14/2011)
It would be very interesting, indeed, if there were a true relational database that some of the people touting the attributes of relational databases could try. I believe the majority of them wouldn't be disappointed while people like me would absolutely hate it because I tend do things with data that probably couldn't be done in a true relational database.

I'd be surprised if you did anything that couldn't be done in a true relational database - but not at all surprised if some of the people out there claiming to know what the one true relational model is produced a database that prevented you from doing them. If you were to get a real relational language I think you would probably like it (and those I sometimes call the "relational fundamentalists" would probably hate it).

So far as T-SQL and SQL in general go... yes, I agree it has some shortcomings but I'm definitely in love with the idea of declarative functionality rather than procedural flexibility although, as some have already stated, a proper mix of the two would be much more useful.

Well, if you want declarative functionality you certainly don't want SQL! Declarative systems people sometimes quarrel with SQL people because SQL people keep on claiming it is a declarative language, which it most certainly is not. Would you adopt the idea that no dml statement can modify the visible data in relations (all it can do is prepare modifications to be carried out by a subsequent commit statement or discarded by rollback) - once you let the modifications proposed by one dml statement be visible to a subsequent dml statement in the same transaction you have a state-oriented procedural dml not a declarative one. This is very hard indeed to handle in a language that doesn't allow higher order functions and general composition of statement components to form more complex statements, so you would end up with something not much like SQL (SQL has taken on board some of that language style with its fairly general use of subqueries and of CTEs, but not enough of it).
With a decent declarative functional language for writing the bits we might do in CLR for SQL, there would be no need for a procedural language for those bits. I don't know whether F# will become such a language, but I suspect Haskell would be a better choice anyway.
And we will always need some non-declarative bits: commitment can be handled as at present, with transaction starts and commits, maybe also with autocommit (single statement transactions). Writing to files (other than database files) with immediate effect (eg for activities needing synchronous logging, such as some debugging) and I guess anything else needed could be handled using monads in the auxiliary language. So there's probably nothing procedural needed at all at database level, and very little that's not declarative. Of course under the covers things like concurrency control and recovery logging and so on will be happening, and these are probably not going to be declarative and may be procedural.


Tom
Post #1109002
« Prev Topic | Next Topic »

Add to briefcase «««2324252627»»»

Permissions Expand / Collapse