RE: SQL 2000 DBA Toolkit Part 2

SSC-Insane

Points: 23224

April 20, 2006 at 8:56 am

INTRODUCTION

Thanks for your feedback. Your quote is appreciated, and now I provide you with one of my personal favorites: "Those who can, do. Those who can't, become critics." First let me explain that I'm making no "arguments" in the article, so I can see why you would find the "argument" to be "totally disappointing." I am simply introducing tools that I have found useful in SQL Server at various times, and explaining how to use them. Feel free to quote me on the following:

"For those who don't find them useful, the simplest solution would probably be to not use them.

For those who do find them useful, the best solution is probably to use them."

BACKGROUND

Next a little background for those who might find the tools useful. (Feel free to skip to this paragraph, Mr. Wootton.) These Regular Expression tools were created in response to another method of implementing Regular Expressions in SQL 2000 - using the sp_OACreate method to invoke the VBA regular expression parser. Unfortunately that method caused me a lot of memory leaks, late-night server re-boots, and some difficult conversions of Perl regex's to Microsoft syntax.

UTILITY

I am not going to argue the utility of Regular Expressions in SQL. Like a debate over the usefulness of O₂ in the atmosphere, the debate over the utility of Regular Expressions in SQL was resolved a long time ago by large organizations full of very smart people. The proof of the utility of Regular Expressions in SQL is self-evident in SQL 2005 and Oracle, which both offer regular expression functionality. In SQL 2005 it is accessed via the .NET Framework. Oracle offers a more direct approach, with statements like "REGEXP_LIKE", "REGEXP_INSTR", "REGEXP_SUBSTR" and "REGEXP_REPLACE" which are similar to the XP/UDF methods I have implemented for SQL 2000.

Your question about "making SQL Server a Perl interpreter or a Unix utility" might best be addressed to Oracle and Microsoft., as they have a lot more people who can take the time to explain the intricate details of why and wherefore. Of course, they might offer you a similar solution: i.e., if you don't like those tools then simply don't use them. There's no reason to get rid of all the fishing nets in the world because you're a carpenter and so will never use one.

EXAMPLES: As You Wish...

Here are two examples of times that I have found regular expressions directly in SQL to be useful in my own experience:

When searching for a specific pattern via Query Analyzer. This is a one-off situation for a single SELECT statement. I would generally use this when debugging an application or otherwise verifying the format of data.

It is absolutely true that you could DTS your 10,000,000 row table out to a text file, copy it over to a UNIX box and run SED against it; or write a Java or .NET application that connects to the database, reads all the data into memory and performs comparisons. Or even just open it in NotePad and click Edit > Find in the main menu and search the flat file. You could conceivably hard-wire a dozen one-time-use disposable applications to perform these specific comparisons for each of the formats you are testing using character-level comparisons and For..Next loops and avoid regular expressions altogether! One application for credit card numbers, one application for email addresses, one application for testing password complexity, and so on and so forth.
Personally I prefer to do any of these tasks with a single SELECT statement and a Regular Expression; particularly since Regular Expressions have already been written for just about any data validation tasks you can imagine.

I have encountered situations in which data validation (of various degrees) at the database level was a functional requirement. I'll not argue the wrongness or rightness of placing validation functionality at that level; it is what it is: what the customer demanded. Those who think data validation at the database level is wrong should feel free to lobby ANSI, Microsoft and Oracle to remove TRIGGERs and CONSTRAINTS. Unfortunately I can do nothing about that.

That said, it's true that you could write a stored procedure or user-defined function for every variation of the data you need to validate, using a counter variable and WHILE loop, or you might even be able to use the extremely limited pattern-matching facilities of the SQL LIKE statement. I can picture a UDF with a WHILE loop and a few "flag" variables to validate email addresses, another UDF using a WHILE loop to validate phone numbers, etc. Alternatively, as you mentioned, you could write a program in JAVA to read all of the data into memory and perform the comparisons for you, although it would need to be invoked via xp_cmdshell. I can only imagine the performance of a system where xp_cmdshell loads command.com which invokes a command-line Java interpreter which in turn loads an application that tests a single piece of data on a table where 10,000,000 rows are being inserted. That's 10,000,000 invocations of xp_cmdshell, assuming your DBA didn't disable xp_cmdshell for security reasons.

CONCLUSION: Put Your Writing Where Your Mouth Is

As far as AWK, SED and the "DOS" capability which has "always been poor", I for one would definitely be interested in seeing any examples you might have of performing validation of SQL Server data using these facilities. Of particular use, to me anyway, would be a way to call this functionality from a Query Analyzer via SELECT statement without the need to export rows to an intermediate format like a flat file.

I think that would make an extremely useful article that many would benefit from; you should definitely consider sharing your experience with the rest of us in article form, particularly the methods you use to access SED and AWK from within SQL Server!

I look forward to further contributions from you, in your admittedly "unambitious style", to the SQL Server community at large!

I sincerely hope your day gets better,

Michael Coles