New T-SQL Fixed Width "Splitter"

  • Comments posted to this topic are about the item New T-SQL Fixed Width "Splitter"

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Hey there. Thanks for the shout-out on SQL#[/url] :-). I like the T-SQL approach and have a few notes (just notes, please don't take any of the following as criticism):

    • For the version that accepts VARCHAR(MAX), why not name it NSplitMax?
    • Regarding VARCHAR, why not do NVARCHAR so that it can work in more situations? Or I suppose it would be easy enough to create NVARCHAR variations of these two functions, using DATALENGTH / 2
    • Why are the @TokenLen and @DelimLen parameters BIGINT? The upper-limit of bytes for all 3 MAX types is the same as the upper-limit of the INT datatype (i.e. 2,147,483,647). See MSDN page for int, bigint, smallint, and tinyint.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (6/20/2016)


    Hey there. Thanks for the shout-out on SQL#[/url] :-). I like the T-SQL approach and have a few notes (just notes, please don't take any of the following as criticism):

    • For the version that accepts VARCHAR(MAX), why not name it NSplitMax?
    • Regarding VARCHAR, why not do NVARCHAR so that it can work in more situations? Or I suppose it would be easy enough to create NVARCHAR variations of these two functions, using DATALENGTH / 2
    • Why are the @TokenLen and @DelimLen parameters BIGINT? The upper-limit of bytes for all 3 MAX types is the same as the upper-limit of the INT datatype (i.e. 2,147,483,647). See MSDN page for int, bigint, smallint, and tinyint.

    Take care,

    Solomon...

    Thanks for chiming in!

    NSplitMax would actually be better; I wish I thought of that. I used to name the [n]varchar(max) version of these types of functions <function>VCMax and thought 2B was simpler. I think <function>MAX is actually a better and may just change it.

    NVarchar is a little slower - I leave that up to people that need an NVarchar version.

    I go with bigint because getnumsAB (the tally table iTVF that the function uses) takes bigint and I wanted to avoid an implicit conversion. To be honest, I have not tested the performance difference between int+implicit conversion vs bigint+no implicit conversion.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (6/21/2016)


    NVarchar is a little slower - I leave that up to people that need an NVarchar version.

    Makes sense. I was just curious.

    I go with bigint because getnumsAB (the tally table iTVF that the function uses) takes bigint and I wanted to avoid an implicit conversion. To be honest, I have not tested the performance difference between int+implicit conversion vs bigint+no implicit conversion.

    This might be a 6 of one / half a baker's dozen of the other πŸ˜‰ type situation. If you are speaking of the input parameters to the getnumsAB iTVF, then I'm not sure if anything is gained regardless since there should still be an implicit conversion happening when calling your split function, assuming that numeric literals are passed into the function call instead of variables. Numeric literals are of type INT, though you could use a variable declared as BIGINT and pass that in. Then all levels would be using the same type. But I doubt there is a noticeable impact in either case. I think it's fine to keep it as BIGINT.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Excellent writeup Alan, thanks for posting this. I had no idea about n-grams, despite having used them for fuzzy-matching for years.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (6/22/2016)


    Excellent writeup Alan, thanks for posting this. I had no idea about n-grams, despite having used them for fuzzy-matching for years.

    Thanks Chris!

    Regarding N-Grams - My second SSC artcicle, which will be the first of a 6-part series on N-Grams will be published tomorrow. πŸ˜€ I'd love your thoughts on the subject.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • I don't know if it's just my machine, but the final code box is empty. I tried Chrome, IE, and IE with compatibility settings. I assume it contains the code for GetNumsAB.

    Any suggestions?

  • Marcia J (7/5/2016)


    I don't know if it's just my machine, but the final code box is empty. I tried Chrome, IE, and IE with compatibility settings. I assume it contains the code for GetNumsAB.

    Any suggestions?

    Hi Marcia. Sorry I missed this post.

    It has nothing to do with your browser (s), that last empty box is nothing. I don't know why it is there. The interface for submitting scripts and articles is very good but can be a little quirky at first.

    I have been meaning to fix this and re-submit the script/post.

    Cheers!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply