The Wild Developers of SQL Server like Wildcards

  • Comments posted to this topic are about the item The Wild Developers of SQL Server like Wildcards

    Best wishes,
    Phil Factor

  • Great article, Phil.  I just worry about such improvements by MS especially if they resort to using .NET functionality behind the scenes.

    For example, look at what they did with the FORMAT function in SQL Server.  It's something that a whole lot of people have wanted for ages.  When they finally do get it, it works correctly but a huge cost... it's an average of 44 times slower than CONVERT.  If that's the way that its backbone works in .NET, then .NET also has a problem.  It seems like they may have used the awful "regex-replace" technology instead of something much simpler and much faster.

    BWAAA-HAAA!!! Reminds me of an old saying with a different twist on it... "Just because you CAN do something in .NET, doesn't mean you should! 😛

    And, I think you're asking a lot from a company that can't even release a string splitter function (STRING_SPLIT) without realizing what the necessary functionality should have been (missing ordinals of the split-out elements).

    That being said and if they do take on the problem(s) you speak of in your good aritcle, I hope they write separate functionality rather than try to incorporate it into LIKE, CHARINDEX, and PATINDEX... there's just too great a chance that it'll come out with performance penalties like FORMAT or half-baked like STRING_SPLIT and I don't want stuff that's currently working well to suddenly become an incurable problem.

    Heh... anyone remember when they first came out with MERGE?  Yeah... that was a "joy".

    "Change is inevitable... change for the better is not".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah. Don't get me started about Format() and String-split(). It was just the poor functionality that got to me with String-Split. I hadn't realized it was so slow.  I'll run a few tests. 
    I agree that it would be useful to tweak the LIKE/Patindex wildcard so that the character specifier, [A-Z0-9], could  have a quantifier. This would mean that you could then specify, say, 'up to three' or between one and four', then we'd  be happy. 
    Obviously, it would be a bonus to be able to specify that the pattern is at the end of the string or line. If a TVF version was created as well, then it could return all matches and the value of the strings matched. If they could produce that, they'd have to think up a different reason for not doing certain business processes within the database!

    Best wishes,
    Phil Factor

  • Are you sure SQL Server supports regexes in XQuery and XPath? From your link http://www.w3.org/TR/xpath-functions I realize the XQuery standards include regexes, but Microsoft left many things out from their XQuery implementation. I'm not aware of regexes in SQL Server's XQuery, although I'm not a big XQuery user and may have missed it.

    On another note I always wondered why LIKE was so limited. I think the reason is to allow SQL Server to optimize the queries where possible (eg LIKE 'ABC%') so it doesn't have to call the function for each row. Also LIKE can benefit from a parallel plan when searching large numbers of rows.

  • Phil Factor - Monday, December 17, 2018 2:29 AM

    Ah. Don't get me started about Format() and String-split(). It was just the poor functionality that got to me with String-Split. I hadn't realized it was so slow.  I'll run a few tests. 
    I agree that it would be useful to tweak the LIKE/Patindex wildcard so that the character specifier, [A-Z0-9], could  have a quantifier. This would mean that you could then specify, say, 'up to three' or between one and four', then we'd  be happy. 
    Obviously, it would be a bonus to be able to specify that the pattern is at the end of the string or line. If a TVF version was created as well, then it could return all matches and the value of the strings matched. If they could produce that, they'd have to think up a different reason for not doing certain business processes within the database!

    No... sorry if I worded that incorrectly.  String_Split runs fast.  It's the functionality that got to me.  The opposite is true for FORMAT... great functionality but really slow.

    It turns out that PERCENTILE_CONT and PERCENTILE_DISC also have great functionality but poor performance.  For example, Peter Larsson's 6 year old method for finding Median using OFFSET/FETCH is about 100 times faster than using either of those functions.  Peter's method is sub-second on 10 million rows... the functions each take 83 seconds on a fire breathing monster of a server and that's after the data has already been loaded to memory.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Outside the context of ETL staging, if your application must resort to pattern matching on varchar columns when joining tables or filtering rowsets, then your data model is to some extent denormalized. It suggests a violation of the most basic first normal form, and unless you're matching on a prefix, it's not indexable. That's why DBAs and high-end SQL coders arn't crazy about using or seeing it.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • The ANSI ISO standards have a predicate for this! SIMILAR TO is based on the POSIX version of grep, which means the wildcards are keywords bracketed by colons. We picked it based on the popularity of POSIX in the US federal government at the time, but it's it's pretty strong.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Not certain where to start here, but I think that there is one common aspect to all of the functions mentioned, none is up to the standard! Format is fantastic in functionality, poor in performance, String_split performs good but lacks the desired functionality....
    😎

  • Excuse my manners, meant to say very good article Phil.
    😎

    Wildcards are one common thing, wild developers are harder to find, but then again, one has to define (born to be) "wild".

  • RegEx is useful, particularly for ETL, but from the perspective an application, having the data in a normalized and index-able format is more useful. A discussion came up a few weeks back about how to optimize keyword searching.
    https://www.sqlservercentral.com/Forums/2009319/Optimize-a-LIKE-text-query

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • In the relational model all data is represented as attributes, within tuples within relations (columns, rows and tables in SQL speak). This means that all data manipulation can be achieved with a very small number of operators.

    You  might ask why should strings be any different? A  string is a relation with two attributes position and character and a primary key of position.  Once you have established that then you can manipulate the string using the normal relational operators and therefore dispense with any special operators for string manipulation.

    Might be worth thinking about.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply