Why do you attempt to make SQL Server a perl interpreter or a Unix utility?
I just cannot see any benefits this pattern matching brings when all the utilities prvovided in the products above (AWK, ED, SED, BOURNE, BASH, KSH,JAVA) are excelllent products solely geared toward the kind of patterns you're attempting to parse and are totally disjoint to database technologies.
Not only that SQL Server is a database tool with limited extensions into the world of file and file data streaming mechanisms, a poor extension of which is your toolkit.
If you can shed some light on its adopted usage within SQL Server as an addition or extension to the embedded toolkits shipped with SQL Server then please share them with me, otherwise time is better spent letting the right tool carry out the correct operation on data, or extending the DOS capability (which has always been poor).
A totally disappointing argument and article that can simply be replaced by one-shot pattern matching utility scripts using simpler notation, more easily readable and more readily available, shareable, updateable, compatible, readable, understandable, reusable .... must I carry on like the article, the list of reasons is endless.
Incorrect assumptions above.
SQL Server can and does work logically on Unix by using linked servers, DTS or windows scripting host, transparent gateways, .NET (to name a few techniques) however this is not relevant to this discussion.
I feel you've missed the whole point of this criticism, that being what use is this extension (used in its vaguest terms) to SQL Server functionality, not the operating system SQL Server resides on.
You opened up your part 2 of your article with a quote about Alice in Wonderland.
Here's another quote for you
Ambition is the last refuge of the failure.
Thanks for your feedback. Your quote is appreciated, and now I provide you with one of my personal favorites: "Those who can, do. Those who can't, become critics." First let me explain that I'm making no "arguments" in the article, so I can see why you would find the "argument" to be "totally disappointing." I am simply introducing tools that I have found useful in SQL Server at various times, and explaining how to use them. Feel free to quote me on the following:
"For those who don't find them useful, the simplest solution would probably be to not use them. For those who do find them useful, the best solution is probably to use them."
Next a little background for those who might find the tools useful. (Feel free to skip to this paragraph, Mr. Wootton.) These Regular Expression tools were created in response to another method of implementing Regular Expressions in SQL 2000 - using the sp_OACreate method to invoke the VBA regular expression parser. Unfortunately that method caused me a lot of memory leaks, late-night server re-boots, and some difficult conversions of Perl regex's to Microsoft syntax.
I am not going to argue the utility of Regular Expressions in SQL. Like a debate over the usefulness of O2 in the atmosphere, the debate over the utility of Regular Expressions in SQL was resolved a long time ago by large organizations full of very smart people. The proof of the utility of Regular Expressions in SQL is self-evident in SQL 2005 and Oracle, which both offer regular expression functionality. In SQL 2005 it is accessed via the .NET Framework. Oracle offers a more direct approach, with statements like "REGEXP_LIKE", "REGEXP_INSTR", "REGEXP_SUBSTR" and "REGEXP_REPLACE" which are similar to the XP/UDF methods I have implemented for SQL 2000.
Your question about "making SQL Server a Perl interpreter or a Unix utility" might best be addressed to Oracle and Microsoft., as they have a lot more people who can take the time to explain the intricate details of why and wherefore. Of course, they might offer you a similar solution: i.e., if you don't like those tools then simply don't use them. There's no reason to get rid of all the fishing nets in the world because you're a carpenter and so will never use one.
EXAMPLES: As You Wish...
Here are two examples of times that I have found regular expressions directly in SQL to be useful in my own experience:
It is absolutely true that you could DTS your 10,000,000 row table out to a text file, copy it over to a UNIX box and run SED against it; or write a Java or .NET application that connects to the database, reads all the data into memory and performs comparisons. Or even just open it in NotePad and click Edit > Find in the main menu and search the flat file. You could conceivably hard-wire a dozen one-time-use disposable applications to perform these specific comparisons for each of the formats you are testing using character-level comparisons and For..Next loops and avoid regular expressions altogether! One application for credit card numbers, one application for email addresses, one application for testing password complexity, and so on and so forth.Personally I prefer to do any of these tasks with a single SELECT statement and a Regular Expression; particularly since Regular Expressions have already been written for just about any data validation tasks you can imagine.
It is absolutely true that you could DTS your 10,000,000 row table out to a text file, copy it over to a UNIX box and run SED against it; or write a Java or .NET application that connects to the database, reads all the data into memory and performs comparisons. Or even just open it in NotePad and click Edit > Find in the main menu and search the flat file. You could conceivably hard-wire a dozen one-time-use disposable applications to perform these specific comparisons for each of the formats you are testing using character-level comparisons and For..Next loops and avoid regular expressions altogether! One application for credit card numbers, one application for email addresses, one application for testing password complexity, and so on and so forth.
Personally I prefer to do any of these tasks with a single SELECT statement and a Regular Expression; particularly since Regular Expressions have already been written for just about any data validation tasks you can imagine.
That said, it's true that you could write a stored procedure or user-defined function for every variation of the data you need to validate, using a counter variable and WHILE loop, or you might even be able to use the extremely limited pattern-matching facilities of the SQL LIKE statement. I can picture a UDF with a WHILE loop and a few "flag" variables to validate email addresses, another UDF using a WHILE loop to validate phone numbers, etc. Alternatively, as you mentioned, you could write a program in JAVA to read all of the data into memory and perform the comparisons for you, although it would need to be invoked via xp_cmdshell. I can only imagine the performance of a system where xp_cmdshell loads command.com which invokes a command-line Java interpreter which in turn loads an application that tests a single piece of data on a table where 10,000,000 rows are being inserted. That's 10,000,000 invocations of xp_cmdshell, assuming your DBA didn't disable xp_cmdshell for security reasons.
CONCLUSION: Put Your Writing Where Your Mouth Is
As far as AWK, SED and the "DOS" capability which has "always been poor", I for one would definitely be interested in seeing any examples you might have of performing validation of SQL Server data using these facilities. Of particular use, to me anyway, would be a way to call this functionality from a Query Analyzer via SELECT statement without the need to export rows to an intermediate format like a flat file.
I think that would make an extremely useful article that many would benefit from; you should definitely consider sharing your experience with the rest of us in article form, particularly the methods you use to access SED and AWK from within SQL Server!
I look forward to further contributions from you, in your admittedly "unambitious style", to the SQL Server community at large!
I sincerely hope your day gets better,
Just for giggles, let's look at some situations and see what solutions you propose:
1. You have SQL Server 2000 on Windows 2003.2. You have a Unix box which is on the same physical network as my SQL Server.3. You have a table with 20,000,000 rows.4. The table has a column for "email address"
TASK #1: One-time Validation
You have just been tasked with performing a one-time validation of all the email addresses in the table against RFC 2822.
One Solution: Use the fn_regex_match() function to write A SINGLE SELECT statement in Query Analyzer.
Wootton solution: ________________________________________________________________
TASK #2: Trigger Validation
Once the one-time validation above has been done, the technical specs for the database are revised to include a requirement for validation of email addresses against RFC 2822 via a TRIGGER at INSERT or UPDATE time. It was decided that validation needed to occur at the database level so that admins and developers updating the rows directly wouldn't accidentally circumvent the validation and enter bad data.
One Solution: Use the fn_regex_match() UDF to create a trigger with a few lines of code in it.
Wootton solution: ________________________________________________________________
TASK #3: Parsing
The boss has tasked you with creating a report of the domain names tied to these email addresses, ranking them from most common to least common. He wants only the domain (i.e., "microsoft.com"), but not the local domain (i.e., "mail." part of "mail.microsoft.com"). You also note that some of the domains are IP addresses which should be returned in full.
One solution: Use xp_regex_search to grab the domains out of the email address and insert them into a temporary table. Then SELECT on the temporary table with a GROUP BY and ORDER BY. In total, about 3 - 4 lines of T-SQL code.
Wootton solution: _________________________________________________________________
CONCLUSION: Show Me The Money!
As requested, I have supplied very specific examples of when and where these tools might be useful. Each is based loosely on real-world situations that I, and other DBAs, have encountered at various times.
I anxiously await your MS SQL Server/Unix-based solutions to these common DBA tasks, Mr. Wootton.
"He has a right to criticize, who has the heart to help." - A. Lincoln
You serious? MS SQL runs on UNIX? And how do you use BOURNE from inside T-SQL???
I use regex on my SQL box now, the pcre one from Code Project http://www.codeproject.com/database/xp_pcre.asp in triggers. Plus I can copy and paste regex from anywhere on the web. I think they're a lot easier than trying to shuttle data back and forth to a unix box every time I want to check a simple pattern. Wouldn't that slow everything down? Can you even do that from a trigger? I tried tsql stroed procs and udfs at first, but they get complicated real fast. What are the other options you talked about that are endless?
Sorry guys ... can't read any of that ... I tend to tune out when there is a lot of bolding and uppercase typing.
PS: Enjoying the series Mike, I'm sure Steve can fill you in on what to do with bad reviews
LOL. Thanks Phill. I usually try to work with critics to explain my position in a rational way and listen to their position in a patient way. Usually there's middle ground we can meet on once we understand each other... But that doesn't always work...
When a guy starts screaming about using UNIX to validate the data in your SQL Server databases without any further explanation, he doesn't leave you much to work with
As always, appreciate the feedback and hope you find the tools useful!