Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Regular expression in T-sql Expand / Collapse
Author
Message
Posted Friday, September 28, 2012 9:05 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:40 PM
Points: 36,751, Visits: 31,200
XMLSQLNinja (9/28/2012)
bteraberry (9/28/2012)
Microsoft invested a huge amount of effort/time/money into CLR to give us tremendous capabilities. I have been told by someone close to the SQL dev team at MS that they are tremendously disappointed by the lack of adoption to the point they significantly curtailed planned efforts to make CLR even more robust. People like CELKO spread misinformation and outright lies about CLR to prevent more widespread adoption. Why they do such a thing is a guess, but I think it has something to do with not wanting to learn new things due to being stuck in the 80's (or 60's as the case may be.) This may seem harsh, but I think it's fantastically unwarranted to lie on a forum that's supposed to help younger developers get the truth about SQL Server.


Well said.

I am by no means a Microsoft Koolaid drinker: Microsoft does some things I don't agree with but introducing CLRs was absolutely a great move on their part.


Ok... now you have me curious. What have you written in the form of a CLR?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366154
Posted Saturday, September 29, 2012 8:32 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 7:34 PM
Points: 555, Visits: 2,580
Jeff Moden (9/28/2012)
XMLSQLNinja (9/27/2012)
CELKO (9/15/2012)

...And then there is a high cost of CLR. .


Hogwash. There is nothing "high cost" about a well written CLR.

Avoid this kludge if you can.

There's nothing "kludgy" about using a CLR to solve a problem that can't be solved by using T-SQL. Period.

I concur with the previous posters who suggest using a CLR for regular expressions.


As with all else, "It Depends". In the case of RegEx CLR vs Like and PatIndex... if you can actually work it out with Like, PatIndex, or even CharIndex, it will usually be faster than making a call to a RegEx CLR. A couple of us just went through all of this on another thread a couple of months ago and the LIKE expressions blew the doors off the RegEx CLR.


Good morning Mr. Moden and Happy Saturday!

If you recall our discussion a few months back about running totals, I do not advocate implementing a CLR for something you can accomplish with a T-SQL query (unless, perhaps there were big performance benefits.) Furthermore, I concur that, if you can work it out with LIKE, CHARINDEX, etc then that is the way to go. Using manub22's Canadian Zip code example: writing some C#, VB, etc, then compiling it, creating a new CLR assembly, adding new managed code to your environment and Source Control, would be, as an old co-worker used to say, "like pole vaulting over a mole hill."

A couple of us just went through all of this on another thread a couple of months ago and the LIKE expressions blew the doors off the RegEx CLR.

I hope you (or anyone else) doesn't think I was claiming that there would be performance benefits by creating a Regex CLR for something that could be resolved with a LIKE expression.

So we have some context - my point was: for things like validating an Email address I cannot think of a better to do this than with a regex CLR.

My other point was that I agreed with bteraberry's post.


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1366189
Posted Saturday, September 29, 2012 9:09 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 7:34 PM
Points: 555, Visits: 2,580
Jeff Moden (9/28/2012)
XMLSQLNinja (9/28/2012)
bteraberry (9/28/2012)
Microsoft invested a huge amount of effort/time/money into CLR to give us tremendous capabilities. I have been told by someone close to the SQL dev team at MS that they are tremendously disappointed by the lack of adoption to the point they significantly curtailed planned efforts to make CLR even more robust. People like CELKO spread misinformation and outright lies about CLR to prevent more widespread adoption. Why they do such a thing is a guess, but I think it has something to do with not wanting to learn new things due to being stuck in the 80's (or 60's as the case may be.) This may seem harsh, but I think it's fantastically unwarranted to lie on a forum that's supposed to help younger developers get the truth about SQL Server.


Well said.

I am by no means a Microsoft Koo-laid drinker: Microsoft does some things I don't agree with but introducing CLRs was absolutely a great move on their part.


Ok... now you have me curious. What have you written in the form of a CLR?


There is no better way to start a Saturday morning than some CLR small talk. If I could write a CLR that helps with a mild hangover that's what I'd be doing write now.

Over the years I have written CLR's for things like: applying regex statements to a string for the purposes of data cleansing/validation. I personally prefer XSLT transforms over SQLXML so I had to write a C# CLR that transforms XML. Stuff I can't do with T-SQL. This week I wrote an F# 'hello world' CLR (I am teaching myself F#).

Again, I aver that the introduction of CLRs was one of the best things Microsoft introduced in SQL 2005. Personally I avoid them whenever possible but not for the nonsensical reasons laid out by Mr. Celco.


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1366196
Posted Monday, October 1, 2012 4:37 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, July 4, 2014 3:55 AM
Points: 2,836, Visits: 5,062
A couple of us just went through all of this on another thread a couple of months ago and the LIKE expressions blew the doors off the RegEx CLR.


Could anyone post a link, please.


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1366472
Posted Monday, October 1, 2012 6:02 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 7:34 PM
Points: 555, Visits: 2,580
Eugene Elutin (10/1/2012)
A couple of us just went through all of this on another thread a couple of months ago and the LIKE expressions blew the doors off the RegEx CLR.


Could anyone post a link, please.


I would also like to see a link to this thread.


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1366507
Posted Monday, October 1, 2012 9:07 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 12:05 PM
Points: 8,556, Visits: 9,047
XMLSQLNinja (9/27/2012)
CELKO (9/15/2012)

...And then there is a high cost of CLR. .


Hogwash. There is nothing "high cost" about a well written CLR.

Avoid this kludge if you can.

There's nothing "kludgy" about using a CLR to solve a problem that can't be solved by using T-SQL. Period.


To some extend I disagree.

CLR for SQL-Server is a kludge - a sensible kludge, because it provides extended functionality, albeit with a performance penalty incurred by the very nature of this kludge, at a much lower development cost for MS than providing it without the kludge (by a massive extension to the T-SQL language, with little or no prospect of getting the standards committees to support it for SQL, and probably several years of research and looking at design options before it could get under way) would have cost (and probably a much earlier release date too). But claiming that CLR never has a performance penalty, or that providing a CLR mechanism rather than putting a fully functional user-defined type/class capacity inside the data engine and in the T-SQL language would not be less of a kludge and deliver better performance for many things that currently have to be done with CLR is just crazy.

Of course there's nothing kludgy about using CLR to solve a problem that can't be done in T-SQL - the kludge is what MS did, not what its users are doing. Since MS has provided no other method of doing it, the users have no choice. Nor is there anything kludgy about using CLR for something that can be done in T-SQL in cases where CLR will deliver better performance (but be careful - some people have been quite surprised when they've done performance measurements).


Tom
Post #1366613
Posted Monday, October 1, 2012 3:36 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:40 PM
Points: 36,751, Visits: 31,200
Eugene Elutin (10/1/2012)
A couple of us just went through all of this on another thread a couple of months ago and the LIKE expressions blew the doors off the RegEx CLR.


Could anyone post a link, please.


Took me a bit to remember that the discussion was on an article by David Poole because it was Solomon Rutzky's code that we were discussing. Here's the link...
http://www.sqlservercentral.com/Forums/Topic1296195-60-1.aspx


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1366796
Posted Tuesday, October 2, 2012 3:48 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, July 4, 2014 3:55 AM
Points: 2,836, Visits: 5,062
Jeff Moden (10/1/2012)
Eugene Elutin (10/1/2012)
A couple of us just went through all of this on another thread a couple of months ago and the LIKE expressions blew the doors off the RegEx CLR.


Could anyone post a link, please.


Took me a bit to remember that the discussion was on an article by David Poole because it was Solomon Rutzky's code that we were discussing. Here's the link...
http://www.sqlservercentral.com/Forums/Topic1296195-60-1.aspx


Thank you Jeff! I've read through thread and original article, and found what I've expected.
This sort of using RegEx (or any other .NET lib) in CLR I call "CLR wrapper". And you are right: this implementation most likely will loose to T-SQL one in terms of performance. And there is good reson for this: It is because that kind of implementation is too generic and not optimised for performance.
The main reason for it to be quite slow and to require higher memory consumption is that every time such CLR function is called, underlying object needs to be instantiated in .NET framework and regular expression pattern re-compiled!
I don't usually do this (and not only with RegEx's) for the above particular reason.
Now, if you would like to design process/function optimised for performance you may need to trade-off its "generic" nature.
If you design your CLR function to work for the specific pattern (and it was couple discussions recently about how to remove every occurrence of GUID string in any text or strip some "patterns" from beginning and end of strings), you can create RegEx based CLR, which will be quite hard (if possible) to outperform with T-SQL.
That is all to do with the way you code CLR with RegEx, there are two simple tips:
1. Create static RegEx object with compiled pattern!
2. Don't error handle in CLR
There are another aspects to mention here:
It will also depend on the string sizes: CLR will win over T-SQL even with bigger margin
In case of replacement required, than more occurrences of string to replace, than again such CLR will win over T-SQL with bigger margin.
But, as I've stated, you will need to have dedicated CLR for a pattern. Yes, it's not too flexible, but it gives you much, much better performance...


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1366927
Posted Wednesday, October 3, 2012 3:40 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, December 4, 2013 1:49 PM
Points: 1,104, Visits: 1,174
The Dictionary.com definition of kludge:
a software or hardware configuration that, while inelegant, inefficient, clumsy, or patched together, succeeds in solving a specific problem or performing a particular task


I would not personally say that CLR is inherently inelegant, inefficient, clumsy or patched together. That you can write poor code with CLR shouldn't be evidence of it being a kludge. People can write bad code in any language.

Certainly the judgement is in the eyes of the beholder and I can understand why someone with a lack of familiarity with .net languages would find CLR to meet the definition of a kludge. I just disagree that is inherently such.


└> bt


Forum Etiquette: How to post data/code on a forum to get the best help
Post #1368068
Posted Wednesday, October 3, 2012 4:20 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 7:34 PM
Points: 555, Visits: 2,580
bteraberry (10/3/2012)
The Dictionary.com definition of kludge:
a software or hardware configuration that, while inelegant, inefficient, clumsy, or patched together, succeeds in solving a specific problem or performing a particular task


I would not personally say that CLR is inherently inelegant, inefficient, clumsy or patched together. That you can write poor code with CLR shouldn't be evidence of it being a kludge. People can write bad code in any language.

Certainly the judgement is in the eyes of the beholder and I can understand why someone with a lack of familiarity with .net languages would find CLR to meet the definition of a kludge. I just disagree that is inherently such.


There is a great Itzek Ben-Gan article from 2006 or 07 titled (something like), "To CLR or Not to CLR: that is the question" In it he talks about how DBAs were apprehensive about introducing CLRs because they can be like giving knives and matches to kids."

I have always been apprehensive of enabling CLRs for exactly that reason. In the same article he does recommend, for example, using a CLR for regexs (he includes a couple in the article). He tests the regex CLR vs Padindex and the CLR smokes the T-Sql implementation like 10 fold.

CLRs are not risk free nor are they always a better choice. But, in some cases, to ignore them is to ignore a powerful Sql server tool.


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1368085
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse