Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

SQL 2000 DBA Toolkit Part 2 Expand / Collapse
Author
Message
Posted Thursday, April 6, 2006 12:01 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/mcoles/sql2000dbatoolkitpart2.asp
Post #271683
Posted Thursday, April 20, 2006 7:37 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, May 11, 2006 1:30 AM
Points: 26, Visits: 1

Why do you attempt to make SQL Server a perl interpreter or a Unix utility?

I just cannot see any benefits this pattern matching brings when all the utilities prvovided in the products above (AWK, ED, SED, BOURNE, BASH, KSH,JAVA) are excelllent products solely geared toward the kind of patterns you're attempting to parse and are totally disjoint to database technologies.

Not only that SQL Server is a database tool with limited extensions into the world of file and file data streaming mechanisms, a poor extension of which is your toolkit.

If you can shed some light on its adopted usage within SQL Server as an addition or extension to the embedded toolkits shipped with SQL Server then please share them with me, otherwise time is better spent letting the right tool carry out the correct operation on data, or extending the DOS capability (which has always been poor).

A totally disappointing argument and article that can simply be replaced by one-shot pattern matching utility scripts using simpler notation, more easily readable and more readily available, shareable, updateable, compatible, readable, understandable, reusable .... must I carry on like the article, the list of reasons is endless.

David Wootton




Post #274172
Posted Thursday, April 20, 2006 7:58 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 5:13 PM
Points: 33,100, Visits: 15,210
I would venture the guess the best tool is the database since that's where most of the data is. SED, AWK, etc., apart from not being on Windows in general or inside SQL Server, is not accessable from within a query. Which is where you would want to match up data.

This isn't UNIX either, so those excellent products work in a great many, albeit minority, of the computers in the world.

The right tool is the one that fits the job and for a SQL Server DBA, these tools can work well. For a Unix admin, it's not an issue since you won't work with SQL Server anyway.







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #274183
Posted Thursday, April 20, 2006 8:20 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, May 11, 2006 1:30 AM
Points: 26, Visits: 1

Incorrect assumptions above.

SQL Server can and does work logically on Unix by using linked servers, DTS or windows scripting host, transparent gateways, .NET (to name a few techniques) however this is not relevant to this discussion.

I feel you've missed the whole point of this criticism, that being what use is this extension (used in its vaguest terms) to SQL Server functionality, not the operating system SQL Server resides on.

You opened up your part 2 of your article with a quote about Alice in Wonderland.

Here's another quote for you

Ambition is the last refuge of the failure.

David Wootton

 




Post #274196
Posted Thursday, April 20, 2006 8:21 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Yesterday @ 5:13 PM
Points: 33,100, Visits: 15,210
So you would advocate pulling all data from the database server, running it through a utility and then putting it back for further querying?






Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #274197
Posted Thursday, April 20, 2006 8:56 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

INTRODUCTION

Thanks for your feedback.  Your quote is appreciated, and now I provide you with one of my personal favorites:  "Those who can, do.  Those who can't, become critics."  First let me explain that I'm making no "arguments" in the article, so I can see why you would find the "argument" to be "totally disappointing."  I am simply introducing tools that I have found useful in SQL Server at various times, and explaining how to use them.  Feel free to quote me on the following:

"For those who don't find them useful, the simplest solution would probably be to not use them.
 For those who do find them useful, the best solution is probably to use them."

BACKGROUND

Next a little background for those who might find the tools useful.  (Feel free to skip to this paragraph, Mr. Wootton.)  These Regular Expression tools were created in response to another method of implementing Regular Expressions in SQL 2000 - using the sp_OACreate method to invoke the VBA regular expression parser.  Unfortunately that method caused me a lot of memory leaks, late-night server re-boots, and some difficult conversions of Perl regex's to Microsoft syntax.

UTILITY

I am not going to argue the utility of Regular Expressions in SQL.  Like a debate over the usefulness of O2 in the atmosphere, the debate over the utility of Regular Expressions in SQL was resolved a long time ago by large organizations full of very smart people.  The proof of the utility of Regular Expressions in SQL is self-evident in SQL 2005 and Oracle, which both offer regular expression functionality.  In SQL 2005 it is accessed via the .NET Framework.  Oracle offers a more direct approach, with statements like "REGEXP_LIKE", "REGEXP_INSTR", "REGEXP_SUBSTR" and "REGEXP_REPLACE" which are similar to the XP/UDF methods I have implemented for SQL 2000.

Your question about "making SQL Server a Perl interpreter or a Unix utility" might best be addressed to Oracle and Microsoft., as they have a lot more people who can take the time to explain the intricate details of why and wherefore.  Of course, they might offer you a similar solution:  i.e., if you don't like those tools then simply don't use them.  There's no reason to get rid of all the fishing nets in the world because you're a carpenter and so will never use one.

EXAMPLES:  As You Wish...

Here are two examples of times that I have found regular expressions directly in SQL to be useful in my own experience: 

  • When searching for a specific pattern via Query Analyzer.  This is a one-off situation for a single SELECT statement.  I would generally use this when debugging an application or otherwise verifying the format of data.

It is absolutely true that you could DTS your 10,000,000 row table out to a text file, copy it over to a UNIX box and run SED against it; or write a Java or .NET application that connects to the database, reads all the data into memory and performs comparisons.  Or even just open it in NotePad and click Edit > Find in the main menu and search the flat file.  You could conceivably hard-wire a dozen one-time-use disposable applications to perform these specific comparisons for each of the formats you are testing using character-level comparisons and For..Next loops and avoid regular expressions altogether!  One application for credit card numbers, one application for email addresses, one application for testing password complexity, and so on and so forth.

Personally I prefer to do any of these tasks with a single SELECT statement and a Regular Expression; particularly since Regular Expressions have already been written for just about any data validation tasks you can imagine.

  • I have encountered situations in which data validation (of various degrees) at the database level was a functional requirement.  I'll not argue the wrongness or rightness of placing validation functionality at that level; it is what it is: what the customer demanded.  Those who think data validation at the database level is wrong should feel free to lobby ANSI, Microsoft and Oracle to remove TRIGGERs and CONSTRAINTS.  Unfortunately I can do nothing about that.

That said, it's true that you could write a stored procedure or user-defined function for every variation of the data you need to validate, using a counter variable and WHILE loop, or you might even be able to use the extremely limited pattern-matching facilities of the SQL LIKE statement.  I can picture a UDF with a WHILE loop and a few "flag" variables to validate email addresses, another UDF using a WHILE loop to validate phone numbers, etc.  Alternatively, as you mentioned, you could write a program in JAVA to read all of the data into memory and perform the comparisons for you, although it would need to be invoked via xp_cmdshell.  I can only imagine the performance of a system where xp_cmdshell loads command.com which invokes a command-line Java interpreter which in turn loads an application that tests a single piece of data on a table where 10,000,000 rows are being inserted.  That's 10,000,000 invocations of xp_cmdshell, assuming your DBA didn't disable xp_cmdshell for security reasons.

CONCLUSION:  Put Your Writing Where Your Mouth Is

As far as AWK, SED and the "DOS" capability which has "always been poor", I for one would definitely be interested in seeing any examples you might have of performing validation of SQL Server data using these facilities.  Of particular use, to me anyway, would be a way to call this functionality from a Query Analyzer via SELECT statement without the need to export rows to an intermediate format like a flat file.

I think that would make an extremely useful article that many would benefit from; you should definitely consider sharing your experience with the rest of us in article form, particularly the methods you use to access SED and AWK from within SQL Server!

I look forward to further contributions from you, in your admittedly "unambitious style", to the SQL Server community at large!

I sincerely hope your day gets better,

Michael Coles

Post #274209
Posted Thursday, April 20, 2006 9:15 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

SOME EXAMPLES

Just for giggles, let's look at some situations and see what solutions you propose:

CONDITIONS

1.  You have SQL Server 2000 on Windows 2003.
2.  You have a Unix box which is on the same physical network as my SQL Server.
3.  You have a table with 20,000,000 rows.
4.  The table has a column for "email address"

TASK #1:  One-time Validation

You have just been tasked with performing a one-time validation of all the email addresses in the table against RFC 2822.

One Solution:  Use the fn_regex_match() function to write A SINGLE SELECT statement in Query Analyzer.

Wootton solution: ________________________________________________________________ 

TASK #2:  Trigger Validation

Once the one-time validation above has been done, the technical specs for the database are revised to include a requirement for validation of email addresses against RFC 2822 via a TRIGGER at INSERT or UPDATE time.  It was decided that validation needed to occur at the database level so that admins and developers updating the rows directly wouldn't accidentally circumvent the validation and enter bad data.

One Solution:  Use the fn_regex_match() UDF to create a trigger with a few lines of code in it.

Wootton solution: ________________________________________________________________

TASK #3:  Parsing

The boss has tasked you with creating a report of the domain names tied to these email addresses, ranking them from most common to least common.  He wants only the domain (i.e., "microsoft.com"), but not the local domain (i.e., "mail." part of "mail.microsoft.com").  You also note that some of the domains are IP addresses which should be returned in full.

One solution: Use xp_regex_search to grab the domains out of the email address and insert them into a temporary table.  Then SELECT on the temporary table with a GROUP BY and ORDER BY.  In total, about 3 - 4 lines of T-SQL code.

Wootton solution: _________________________________________________________________

CONCLUSION:  Show Me The Money!

As requested, I have supplied very specific examples of when and where these tools might be useful.  Each is based loosely on real-world situations that I, and other DBAs, have encountered at various times.

I anxiously await your MS SQL Server/Unix-based solutions to these common DBA tasks, Mr. Wootton.

Cheers,

Michael Coles

"He has a right to criticize, who has the heart to help." - A. Lincoln

Post #274217
Posted Thursday, April 20, 2006 11:48 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, April 20, 2006 11:31 AM
Points: 1, Visits: 1

You serious?  MS SQL runs on UNIX?  And how do you use BOURNE from inside T-SQL???

I use regex on my SQL box now, the pcre one from Code Project http://www.codeproject.com/database/xp_pcre.asp in triggers.  Plus I can copy and paste regex from anywhere on the web.  I think they're a lot easier than trying to shuttle data back and forth to a unix box every time I want to check a simple pattern.  Wouldn't that slow everything down?  Can you even do that from a trigger?  I tried tsql stroed procs and udfs at first, but they get complicated real fast.  What are the other options you talked about that are endless?

Thanks.

Post #274265
Posted Thursday, April 27, 2006 6:22 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, July 20, 2014 9:06 PM
Points: 2,693, Visits: 1,202

Sorry guys ... can't read any of that ... I tend to tune out when there is a lot of bolding and uppercase typing.

 

PS: Enjoying the series Mike, I'm sure Steve can fill you in on what to do with bad reviews

 



Hope this helps
Phill Carter
--------------------
Colt 45 - the original point and click interface

Australian SQL Server User Groups - My profile
Phills Philosophies
Murrumbeena Cricket Club
Post #275828
Posted Thursday, April 27, 2006 8:31 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132

LOL.  Thanks Phill.  I usually try to work with critics to explain my position in a rational way and listen to their position in a patient way.  Usually there's middle ground we can meet on once we understand each other...  But that doesn't always work...

When a guy starts screaming about using UNIX to validate the data in your SQL Server databases without any further explanation, he doesn't leave you much to work with

As always, appreciate the feedback and hope you find the tools useful!

Post #275893
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse