SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


RegEx for DBAs


RegEx for DBAs

Author
Message
Dave Poole
Dave Poole
SSC-Insane
SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)SSC-Insane (24K reputation)

Group: General Forum Members
Points: 24263 Visits: 3493
Comments posted to this topic are about the item RegEx for DBAs

LinkedIn Profile
www.simple-talk.com
rraja
rraja
Mr or Mrs. 500
Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)Mr or Mrs. 500 (552 reputation)

Group: General Forum Members
Points: 552 Visits: 122
Very nice Article.:-)
david.rowland
david.rowland
SSC-Enthusiastic
SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)SSC-Enthusiastic (160 reputation)

Group: General Forum Members
Points: 160 Visits: 101
Although RegEx is powerful, I have tried using it once and found that coding my own CLR function to manipulate strings to what I want was 100 times faster than the RegEx function was.

If you are running the code against a small dataset this should not matter however I was running the code against millions of rows and this had a major impact on performance.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)

Group: General Forum Members
Points: 340102 Visits: 42630
Great and very nicely put together article, David. My hat is off to you. But what of performance? Have you done any comparisons between the Regex function and a moderately complex PatIndex?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)

Group: General Forum Members
Points: 340102 Visits: 42630
david.rowland (5/7/2012)
Although RegEx is powerful, I have tried using it once and found that coding my own CLR function to manipulate strings to what I want was 100 times faster than the RegEx function was.

If you are running the code against a small dataset this should not matter however I was running the code against millions of rows and this had a major impact on performance.


What did your custom CLR do to beat Regex so soundly?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
corey lawson
corey lawson
UDP Broadcaster
UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)

Group: General Forum Members
Points: 1443 Visits: 588
I've used our own in-house implementations of some of the .Net RexEx functions on seemingly large datasets (several million records each), mostly to help clean up, validate and parse addresses, city-state-zip (or province), etc.

Beats pure SQL functions like Hulk did to Loki in the movie, especially in understandability. I'll take teasing a complicated Rexex pattern over scads of nested SQL functions trying to do the same thing.

one I really have done lots of is to do is keep expanding on a pattern like this: '(Company|corp|corporation|inc|llp|llc...)'

beats continuing to add "or field like '%new_match%' ..."
borisk 57245
borisk 57245
Valued Member
Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)Valued Member (72 reputation)

Group: General Forum Members
Points: 72 Visits: 39
RegEx is Thee best tool in order to find any pattern within given string[did couple of test including .Net String manipulation and limited RegEx in SQL , Like/PATIndex....], and it proves to be much faster than any other way.
in regards to good editor, try this one: http://www.ultrapico.com/Expresso.htm

moreover, consider using FullText indexing from time to time, as it might even ease searching for Words other than using RegEx.
Jeffrey Bijkerk
Jeffrey Bijkerk
SSC Rookie
SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)

Group: General Forum Members
Points: 34 Visits: 34
If you use regex combined with a where clause it can not use the index anymore. I think this is very important.
williams.frd
williams.frd
SSC Rookie
SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)SSC Rookie (33 reputation)

Group: General Forum Members
Points: 33 Visits: 13
Am I out of line in thinking this is something that should be handled in a high-level object-oriented language?
paul.knibbs
paul.knibbs
SSCertifiable
SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)SSCertifiable (5.8K reputation)

Group: General Forum Members
Points: 5826 Visits: 6240
I'm not particularly familiar with the CLR--does processing stuff via RegEx in this way effectively mean you're using RBAR, or is it a bit cleverer than that?
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search