• Solomon Rutzky (5/9/2012)


    Thank you for doing that test (really). I simply don't have the time these days and it is good to see solid numbers. For this particular example I do agree that the T-SQL equivalent is not unreadable and in fact I figured it would be an even simpler form along the lines of:

    WHERE (postCode LIKE '[A-Z][A-Z0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z0-9][0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z][0-9][0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]')

    But again, to be fair to the intent of David's article, this was an intro to Regular Expressions for people and not a shining example of the full power of super-complex patterns (Corey Lawson, in the post following your post that I am replying to, alluded to some of the more complex operations). The PostalCode example is just that: something that is easily digestible for people to comprehend what the syntax is doing. For the most part, I did kinda expect that the pure T-SQL condition would perform better, even if not as pretty.

    And again, I don't think David is trying to convince anyone that they simply must use Regular Expressions. It is a matter of exposing people to a powerful and flexible tool that does come in handy sometimes. There are definitely some situations at work, typically ad-hoc debugging (i.e. functionality outweighs performance), where I would have GREATLY benefited from some RegEx functions, especially where the patterns in the data are not uniform like they are here. I am working on a related article (albeit very slowly due to being unable to teach the kids how to change their own diapers ;-)) that goes over various situations that I, and some others, have encountered where RegEx proved indispensable. Maybe when I get farther along with that I can have you and Jeff try to come up with pure T-SQL equivalents. Might be kinda fun :-).

    Take care,

    Solomon...

    What I like about this site, what keeps me coming back, is that I can have a discussion with complete strangers that often have far more experience than me (and greater knowledge!), where I disagree with what they've said and instead of getting angry they will discuss the topic. Kudos.

    Anyway, back to the topic at hand. You're right of course, the WHERE OR combination is simpler (I overlooked it to be honest).

    Here's some performance results: -

    SET STATISTICS IO ON;

    SET STATISTICS TIME ON;

    SELECT COUNT(*)

    FROM dbo.testEnvironment

    WHERE (postCode LIKE '[A-Z][A-Z0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z0-9][0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z][0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z][0-9][0-9] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][0-9][A-Z] [0-9][A-Z][A-Z]'

    OR postCode LIKE '[A-Z][A-Z][0-9][A-Z] [0-9][A-Z][A-Z]')

    SET STATISTICS TIME OFF;

    SET STATISTICS IO OFF;

    No index

    -----------

    530604

    Table 'testEnvironment'. Scan count 1, logical reads 3212, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1422 ms, elapsed time = 1413 ms.

    With index

    -----------

    530604

    Table 'testEnvironment'. Scan count 1, logical reads 2606, physical reads 0, read-ahead reads 16, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 1453 ms, elapsed time = 1447 ms.

    As for the more complex patterns, well if you have anything to hand then I'd love to see some performance results. The fact is that I'd love to use regular expressions instead of building up more complicated T-SQL, it'd make my job much easier. But I'm paid for doing the best I can in the time frame available, which means I can't justify using something that I have never seen outperform T-SQL.


    --edit--


    David.Poole (5/10/2012)


    Guys, the reason I wrote the article was not to say RegEx is king or you must use it. I've got away with not using it for a very long time.

    I wrote the article with two intents

    1. To show a basic primer to RegEx

    2. To open up the possibility of it use as a weapon in the DBA arsenal.

    The examples I have given are VERY simple. This is deliberate to illustrate the basics of RegEx syntax.

    Sorry, you must have posted while I was typing.

    I'm not arguing against using RegEx per se, I just wanted to see some performance results to give me a reason to use it. Don't get me wrong here, I have a RegEx CLR that I wrote on my server. I do tests to see if it's worth using, it's part of my process to explore a few options to every problem. I've just not come across a place where it beats the T-SQL alternative (except for string splitting, and it was so close that the T-SQL alternative was implemented so that we wouldn't need to convince our clients to enable CLR on their servers).

    David.Poole (5/10/2012)


    If I am reading them correctly they are saying the RegEx CPU usage is more intensive but that reads and scans are significantly lower.

    Yes, I read the results the same. So in an environment where CPU is king and memory is limited, it's probably better to use a CLR. Perhaps I'll set up a VM to simulate such an environment at a later date to test that theory.

    David.Poole (5/10/2012)


    When I use RegEx it tends to be for RegEx.Replace().

    I use normal T-SQL in the WHERE clause to get to close to the recordset that should be affected.

    I'll give this sort of task a go at some point, but I suspect that the T-SQL will have a very good chance of keeping up with the RegEx.

    Shall we take an example of an address string where we want to extract the post-code? Or do you have something else in mind? I'll set up a test for this scenario later on and we'll take a look.

    David.Poole (5/10/2012)


    The puzzles on the BeyondRealtional site illustrate what is possible from T-SQL but a lot of it falls into the realms of expert T-SQL. Judging by the T-SQL I see day in day out most developers and DBAs would struggle to produce some of that code. Elegant and efficient it may be but that is because it is produced by master craftsman. I've done a lot of interviewing and there ain't many of those about.

    I've been a developer for about 3 years now. I'm competent, but certainly not a master (or even a journeyman). I've had days where "right now" was more important that elegance and efficiency (not in my current job, where performance is king), but even in those cases it'd just mean that I'd go back to it later when time was more on my side.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/