PatReplace8K

  • Alan Burstein

    SSC Guru

    Points: 61026

    Comments posted to this topic are about the item PatReplace8K

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • sequelgarrett

    Right there with Babe

    Points: 778

    Why is the CASE statement necessary?

  • Alan Burstein

    SSC Guru

    Points: 61026

    sequelgarrett (6/29/2015)


    Why is the CASE statement necessary?

    I have to go through my notes, I can't remember at the moment and I don't have access to a PC. It had something to do with the behavior of how the string is re-concatenated using XML PATH (''). I recently changed the code and it may not be necessary anymore. I will have to re-test without the CASE statement.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Nice job, Alan.

    One question: what's the CHECKSUM() for? It doesn't appear to change the result from ROW_NUMBER().

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • DonlSimpson

    SSCertifiable

    Points: 6774

    ChrisM@Work (6/30/2015)


    Nice job, Alan.

    One question: what's the CHECKSUM() for? It doesn't appear to change the result from ROW_NUMBER().

    Same question.

    Don Simpson



    I'm not sure about Heisenberg.

  • Alan Burstein

    SSC Guru

    Points: 61026

    DonlSimpson (6/30/2015)


    ChrisM@Work (6/30/2015)


    Nice job, Alan.

    One question: what's the CHECKSUM() for? It doesn't appear to change the result from ROW_NUMBER().

    Same question.

    PatExclude8K and PatReplace8K evolved from the thread at the end of this comment where Lowell was looking for tips to develop an itvf_strip_nonnumeric function. (Check it out if you have not seen the thread - it's a great stuff, I learned a ton).

    The original code for the tally table part looked like this:

    SELECT TOP (LEN(@String)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E1 a....)

    Eirikur changed the tally table part to deal with the implicit conversation in the TOP clause like this:

    SELECT TOP (CONVERT(BIGINT,LEN(@String),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E1 a....)

    Jeff Moden further enhanced it by removing the conversion to BIGINT in the TOP clause and doing the CHECKSUM over ROW_NUMBER(). He explains the change in the comment section of his updated function:

    2. CHECKSUM returns an INT and will return the exact number given if given an INT to begin with. It's also faster than a CAST or CONVERT and is used as a performance enhancer by changing the BIGINT of ROW_NUMBER() to a more appropriately sized INT.

    Based on my testing, the function appears to perform a bit faster using CHECKSUM over ROW_NUMBER() instead of the conversion to BIGINT in the TOP clause.

    http://www.sqlservercentral.com/Forums/Topic1585850-391-3.aspx

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • DonlSimpson

    SSCertifiable

    Points: 6774

    Alan.B (6/30/2015)


    Based on my testing, the function appears to perform a bit faster using CHECKSUM over ROW_NUMBER() instead of the conversion to BIGINT in the TOP clause.

    http://www.sqlservercentral.com/Forums/Topic1585850-391-3.aspx

    So it should only be an issue if there is a chance that row_number would exceed the boundaries of an int, correct?

    Don Simpson



    I'm not sure about Heisenberg.

  • ChrisM@Work

    SSC Guru

    Points: 186043

    Alan.B (6/30/2015)


    ...Based on my testing, the function appears to perform a bit faster using CHECKSUM over ROW_NUMBER() instead of the conversion to BIGINT in the TOP clause. ...

    So CHECKSUM over ROW_NUMBER() is used to convert the result of ROW_NUMBER() from BIGINT to INT because it's faster than CAST or CONVERT. Cool.

    TOP (without percent) uses BIGINT so an INT value passed in will be converted internally to BIGINT. I'd expect an internal conversion to be faster than an explicit conversion but if your testing shows otherwise then that's cool to know too.

    [font="Arial"]“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw[/font]


    For fast, accurate and documented assistance in answering your questions, please read this article[/url].
    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]
    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
    [url

  • Alan Burstein

    SSC Guru

    Points: 61026

    DonlSimpson (6/30/2015)


    Alan.B (6/30/2015)


    Based on my testing, the function appears to perform a bit faster using CHECKSUM over ROW_NUMBER() instead of the conversion to BIGINT in the TOP clause.

    http://www.sqlservercentral.com/Forums/Topic1585850-391-3.aspx

    So it should only be an issue if there is a chance that row_number would exceed the boundaries of an int, correct?

    Sorry that I missed this (the wife and I had a baby a few weeks ago).

    Yes, correct. And since we're dealing with an 8000 character limit/1 row_number per character, it will never get near BIGINT levels.

    Again, credit to Jeff Moden and Eirikur for that little trick.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • Manie

    SSCarpal Tunnel

    Points: 4468

    Hi Alan,

    Excellent script and thanks a lot for this. I will definitely use it a lot. I would like to go through the script at leisure time and might have some questions later if that is OK with you. Can do some learning here. Now time is tight just about had time to test your script:-P:-P:-P:-P:-P:-P

    Manie Verster
    Developer
    Johannesburg
    South Africa

    I am happy because I choose to be happy.
    I just love my job!!!

  • Alan Burstein

    SSC Guru

    Points: 61026

    manie (6/17/2016)


    Hi Alan,

    Excellent script and thanks a lot for this. I will definitely use it a lot. I would like to go through the script at leisure time and might have some questions later if that is OK with you. Can do some learning here. Now time is tight just about had time to test your script:-P:-P:-P:-P:-P:-P

    Thanks Marie. Questions are encouraged.

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

  • kast218

    SSCertifiable

    Points: 6239

    Hi Alan, great thanks for very useful function.

    Please, fix variables names for Case Sensitive instance: change @String on @string (or vice versa) and @Pattern on @pattern.

    I fixed this issue  here: https://github.com/ktaranov/sqlserver-kit/commit/b263d9aa7d30f35ff324ee26ca8b91172df72713

    Also, for CS instance, please fix PatExclude8K iTally alias.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply