RE: RegEx for DBAs – SQLServerCentral

SSC Guru

Points: 75906

May 16, 2012 at 2:08 pm

You are half right Jeff,

We'll leave out the vagueness of the term "Big Data" which is more marketing than useful. I think Buck Woody has the closest to a useful definition http://blogs.msdn.com/b/buckwoody/archive/2011/10/18/big-data-and-the-cloud-more-hype-or-a-real-workload.aspx

Volume and velocity of data are the bits that will be most familiar to DBAs. Variety and variability are the bits where we start pushing the envelope.

Microsoft are making a big play around SQOOP allowing SQL2012 to play nicely with Hadoop and I think DBAs in the DW space need to start looking at Hadoop.

For me the most interesting aspect of "Big Data" is solving the problems of mining useful information out of the "unstructured" stuff. For example, mining bulleting boards and forums to derive useful and leveragable information by means of an automated process. To be able to do that sort of stuff you need "Big Brains" and this is where the discipline of "Data Scientist" comes in.

Text parsing and term extraction form part of it and this is where RegEx becomes useful. You and others are absolutely right to point out that it has a big performance penalty and I can't imagine anyone sane using it in the data tier for OLTP. In a data warehouse, and in particular the staging area of a data warehouse it is a very powerful weapon.

If you are worried about performance then you'll blow a gasket when you see what a data profiler does to a database server. Horrible it may be but entirely necessary in the DW space.

Steve Jones posted an editorial a while back on the evolution and possible future evolution of SQL Server that used the analogy of Grandpa's axe. It's had 4 handles and 5 heads but it is still Grandpa's axe. Similarly SQL Server started out as a RDBMS but is now touching all sorts of things outside of the RDBMS space but is still SQL Server.

LinkedIn Profile