Trading in xp_cmdshell for SQLCLR (Part 1) - List Directory Contents

Question

Trading in xp_cmdshell for SQLCLR (Part 1) - List Directory Contents

Viewing 15 posts - 31 through 45 (of 169 total)

You must be logged in to reply to this topic. Login to reply

noeld SSC Guru Points: 96590 More actions · Answer 1

Actually .NET has very robust exception handling abilities, and if you implement them along with TSQL exception handling there shouldn't be problems.

Right, a missing "catch" in an assembly is a pain!

Describe a stability problem that you have encountered in a SAFE/EXTERNAL_ACCESS assembly implementation? If you go UNSAFE all bets are off, but SAFE/EXTERNAL_ACCESS cannot cause you stability issues.
You also have to run this kind of code with UNSAFE access, so it really is all about how you control what you put into your SQL Server.

I am not talking about UNSAFE. Not all developers have solid programming skills to write something that "scales". Putting bad code on the server is not a good thing. I do understand that "with power comes responsibility" but is a lot harder to do it right in a busy server.

Personally I had to deal with (mis)use cases related with large memory allocations.

SEE: http://blogs.msdn.com/sqlclrperf/archive/2007/06/01/scalable-memory-usage-in-sqlclr.aspx

It is NOT pretty to get GC running and you have no control or very little over it

Describe a problem caused by a SAFE/EXTERNAL_ACCESS assembly in SQL Server and how it was harder to deal with.

It is harder for example to replace an under-performing assembly than an stored procedure across multiple servers for example.

* Noel

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 2

noeld (2/5/2009)
I am not talking about UNSAFE. Not all developers have solid programming skills to write something that "scales". Putting bad code on the server is not a good thing. I do understand that "with power comes responsibility" but is a lot harder to do it right in a busy server.
Personally I had to deal with (mis)use cases related with large memory allocations.
SEE: http://blogs.msdn.com/sqlclrperf/archive/2007/06/01/scalable-memory-usage-in-sqlclr.aspx
It is NOT pretty to get GC running and you have no control or very little over it

You get the same performance problems in SQLCLR as you could in TSQL from a developer not having the knowledge on how to efficiently do things. You might spot the TSQL problem in code earlier, but only because you have more experience with TSQL.

If your objects implement IDisposible, then you should be coding with "using" blocks, which trigger GC when they fall out of scope. This is one of the recommended practices with coding in SQLCLR. Also whereever the same object is consistently declared the same way it is best to evaluate whether it could be better used as a static readonly object, where it is only created once by the system and can be reused consistently. For example common Regular Expression objects.

If you use expensive objects like datasets in SQLCLR, replace them with DataReaders and work itteratively, which is not bad for performance in CLR like it is in TSQL.

It is harder for example to replace an under-performing assembly than an stored procedure across multiple servers for example.

How so? If you rely on Visual Studio's deploy processes to deploy your assemblies, then stop and learn how to use manual scripting which will allow you to ALTER ASSEMBLY just like you ALTER PROCEDURE. Changing an assembly out is no more difficult than changing a stored procedure out if you know the DDL required to perform the operations.

sean hawkes SSC Eights! Points: 970 More actions · Answer 3

To me the SQLCLR vs. xp_cmdshell seems like the old script vs. small application debate. Seems that SQLCLR is very similar to writing a small application. I mean you're writing code, building an assembly, mapping it to the sql server. Seems like a lot of overhead to get a directory listing into a table. Granted I do like the idea of passing table variables and the such are nice and I can see for operations that are impossible within TSQL the need. But, I can see how doing all of this is like hitting a finishing nail with a sledge hammer. Again, to me it comes down to what am I really trying to do, and how much time I have to do that task. Lets face it there are situations where time is of the essence and you don't have time to create a new assembly and "hook" it to your sql server.

My other question is how does this differer from creating a dll that is mapped to an extended stored procedure?

Sean

Jeff Moden SSC Guru Points: 1004621 More actions · Answer 4

Don, here is a VERY good example on why CLR's sometimes ARE the way to go:
SQL Server CLR function to improve performance of validating email addresses

Lordy, here we go again... someone takes a really horrible example of some horribly written SQL and supposedly tests it against a CLR and supposedly comes up with some wonderful execution time and, as near as I can tell, none of you have actually tested it.

I've said it before and I'll say it again, CLR's are for people who don't know how to write T-SQL. Just look at the example code in that link...

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE FUNCTION [dbo].[udf_Txt_IsEmail] (

@EmailAddr varchar(255) -- Email address to check

) RETURNS BIT -- 1 if @EmailAddr is a valid email address

/*

* Checks an text string to be sure it's a valid e-mail address.

* Returns 1 when it is, otherwise 0.

* Example:

SELECT CASE WHEN 1=dbo.udf_Txt_IsEmail('anovick@NovickSoftware.com')

THEN 'Is an e-mail address' ELSE 'Not an e-mail address' END

*

* Test:

print case when 1=dbo.udf_txt_isEmail('anovick@novicksoftware.com')

then 'Passes' else 'Fails' end + ' test for good addr'

print case when 0=dbo.udf_txt_isEmail('@novicksoftware.com')

then 'Passes' else 'Fails' end + ' test for no user'

print case when 0=dbo.udf_txt_isEmail('anovick@n.com')

then 'Passes' else 'Fails' end + ' test for 1 char domain'

print case when 1=dbo.udf_txt_isEmail('anovick@no.com')

then 'Passes' else 'Fails' end + ' test for 2 char domain'

print case when 0=dbo.udf_txt_isEmail('anovick@.com')

then 'Passes' else 'Fails' end + ' test for no domain'

print case when 0=dbo.udf_txt_isEmail('anov ick@novicksoftware.com')

then 'Passes' else 'Fails' end + ' test for space in name'

print case when 0=dbo.udf_txt_isEmail('ano#vick@novicksoftware.com')

then 'Passes' else 'Fails' end + ' test for # in user'

print case when 0=dbo.udf_txt_isEmail('anovick@novick*software.com')

then 'Passes' else 'Fails' end + ' test for * asterisk in domain'

****************************************************************/

AS BEGIN

DECLARE @AlphabetPlus VARCHAR(255)

, @Max INT -- Length of the address

, @Pos INT -- Position in @EmailAddr

, @OK BIT -- Is @EmailAddr OK

-- Check basic conditions

IF @EmailAddr IS NULL

OR NOT @EmailAddr LIKE '_%@__%.__%'

OR CHARINDEX(' ',LTRIM(RTRIM(@EmailAddr))) > 0

RETURN(0)

SELECT @AlphabetPlus = 'abcdefghijklmnopqrstuvwxyz01234567890_-.@'

, @Max = LEN(@EmailAddr)

, @Pos = 0

, @OK = 1

WHILE @Pos < @Max AND @OK = 1 BEGIN

SET @Pos = @Pos + 1

IF NOT @AlphabetPlus LIKE '%'

+ SUBSTRING(@EmailAddr, @Pos, 1)

+ '%'

SET @OK = 0

END -- WHILE

RETURN @OK

END

go

GRANT EXEC on dbo.udf_txt_isEmail to PUBLIC

go

It's not the most efficient of code. That's why I have to update it.

Even the author of the code admits it's not optimal. In fact, it's RBAR on steroids.

But, even at that, on a 10,000 row test to find all bad emails, it certainly does NOT take over 14 seconds to run. I'll prove THAT in a minute on my poor ol' 7 year old, single 1.8 Ghz CPU with only a Gig of RAM.

Second of all, this particular check can be done so easily that it's actually an insult to common sense and the resources of the machine to use a UDF. Yeah, yeah... I know... "Convenient encapsulation for developers". What you really mean is "Convenient encapsulation for developers that don't have a clue" and that's why people use some UDF's and it's certainly why they resort to a bloody CLR.

I can't test the CLR from that horrible article, but I'll make it easy for someone who can to compare the actual code all at once...

First, we need 10,000 rows of test data... that's easy... most of the rows in the following are "good" and there's some broken email addresses... read the comments in the code that follows...

SET NOCOUNT ON

DROP TABLE JBMTest

GO

--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

-- Column "EMailAddr" has good email addresses. We'll break some later.

-- Jeff Moden

SELECT TOP 10000

RowNum = IDENTITY(INT,1,1),

EMailAddr = CAST('simpletest@someemail.com' AS VARCHAR(255)),

Status = CAST('Good' AS VARCHAR(50))

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1

CROSS JOIN Master.dbo.SysColumns t2

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

--===== Break some known email addresses

UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novicksoftware.com', Status = 'Good' WHERE RowNum = 1000

UPDATE dbo.JBMTest SET EMailAddr = '@novicksoftware.com', Status = 'Bad No User' WHERE RowNum = 2000

UPDATE dbo.JBMTest SET EMailAddr = 'anovick@n.com', Status = 'Bad 1 Char Domain' WHERE RowNum = 3000

UPDATE dbo.JBMTest SET EMailAddr = 'anovick@no.com', Status = 'Good 2 Char Domain' WHERE RowNum = 4000

UPDATE dbo.JBMTest SET EMailAddr = 'anovick@.com', Status = 'Bad No Domain' WHERE RowNum = 5000

UPDATE dbo.JBMTest SET EMailAddr = 'anov ick@novicksoftware.com',Status = 'Bad Space in Name ' WHERE RowNum = 6000

UPDATE dbo.JBMTest SET EMailAddr = 'ano#vick@novicksoftware.com',Status = 'Bad # in Name' WHERE RowNum = 7000

UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novick*software.com',Status = 'Bad * in Name' WHERE RowNum = 8000

UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novick_software.com',Status = 'Good _ in Name' WHERE RowNum = 9000

GO

Now, let's re-write the function with some common sense... something that understands the power of NULLs and something that, in and of itself, has no explicit RBAR...

CREATE FUNCTION dbo.udf_Txt_IsEmail_Improved

-- Written by Jeff Moden

(

@EmailAddr VARCHAR(255)

)

RETURNS BIT

AS

BEGIN

--===== Declare obviously name variables

DECLARE @OK BIT

SET @OK = 0

--===== Check the address for form and content

SELECT @OK = 1

WHERE @EmailAddr LIKE '_%@__%.__%'

AND @EmailAddr NOT LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'

RETURN @OK

END

GO

GRANT EXEC ON dbo.udf_txt_isEmail_Improved to PUBLIC

GO

Let's go one step further, eh? Let's just do it right because this code is SO simple, that you don't even need a UDF for this...

--===== My test without a UDF because THE CODE IS THAT SIMPLE!!!!!!!

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

SELECT *

FROM dbo.JBMTest

WHERE EmailAddr NOT LIKE '_%@__%.__%'

OR EmailAddr LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'

PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms without a UDF'

GO

Alright... we've got test data... we have the function from the article... we have my function... and we have some nice simple set-based code that blows all of the above away...

--===== The looping UDF test

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

SELECT *

FROM dbo.JBMTest

WHERE dbo.udf_txt_isEmail(EMailAddr) = 0

PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms for Looping UDF'

GO

--===== My set-based UDF test

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

SELECT *

FROM dbo.JBMTest

WHERE dbo.udf_txt_isEmail_Improved(EMailAddr) = 0

PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms for my UDF'

GO

--===== My test without a UDF because THE CODE IS THAT SIMPLE!!!!!!!

DECLARE @StartTime DATETIME

SET @StartTime = GETDATE()

SELECT *

FROM dbo.JBMTest

WHERE EmailAddr NOT LIKE '_%@__%.__%'

OR EmailAddr LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'

PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms without a UDF'

GO

When I run that bit of computational heaven against the test data I built above, here's the results I get...

1263 ms for Looping UDF

356 ms for my UDF

233 ms without a UDF

Like I said... I don't know where the hell the author of that article get's off saying that his looping UDF takes over 14 seconds to run, but he was wrong even in that!!! :crazy:

I'll also plug set-based code here a minute. Lots of folks keep saying they don't have the time to figure out set based code and/or that it makes for much longer, more complicated code. I believe that, especially in this case, we just proved that's a huge load hooie The set based code is both easier to read and understand than the While loop UDF and it's also a heck of a lot shorter. In fact, it's so short and easy to do that anyone who puts it in a UDF needs to be shot in the head with a volley of high velocity, bone-in, frozen pork chops!

Ok... now let's just say that the CLR that was included in the article actually comes in at 53 milliseconds like the article claims. First of all, I don't trust anything in that article because the time for the UDF was so very far off that I don't trust the authors methods of testing. Second, look at all the crap that can happen with a CLR. What crap? Look back at this thread! That's what! Look at all the things that can go wrong! Even Jonathan, as much of an expert on the subject he is, missed something. Add the overhead of CLR's to that, and ya just gotta ask yourself, is it all worth it?

Somebody, please! Test the CLR and the code I just posted all on one machine! Let's see what really gives!

Now, about this directory stuff and using xp_CmdShell and parsing directory listings that could change... again, it all sounds like a good idea, but it's not. Someone already stated that the format of the DOS DIRectory listing could change and that would break the code, so I'm thinkin' that's a pretty bad idea to try and parse it. And, it's for good reason that a lot of DBA's don't want xp_CmdShell to be used. But, what harm is there if the server is an ETL computer with no public facing interfaces?

But, let's just assume the DBA stands firm (and they have the right to... they've been charged with protecting the data and the environment it's in!). Will the DBA allow the use of OLE automation? Even if not, if you explain that you can move a directory listing into a table with OLE automation instead of a CLR, I'll just be that (s)he will get all cooperative really sudden like. Something like this should do...

-- CREATE PROCEDURE dbo.GetFileDetails

-- --===== Declare the I/O parameters

-- @piFullPath VARCHAR(128)

-- AS

declare @piFullPath VARCHAR(128)

set @piFullPath = 'c:\temp\'

SET NOCOUNT ON

--=======================================================================================

-- Local variables

--=======================================================================================

--===== These are processing control and reporting variables

DECLARE @Counter INT --General purpose counter

DECLARE @CurrentName VARCHAR(256) --Name of file currently being worked

DECLARE @DirTreeCount INT --Remembers number of rows for xp_DirTree

DECLARE @ErrorMessage VARCHAR(255) --the potential error message

DECLARE @ErrorObject INT --the error object

DECLARE @HR INT --the HRESULT returned from

DECLARE @IsFile BIT --1 if Name is a file, 0 if not

--===== The are object "handle" variables

DECLARE @ObjFile INT --File object

DECLARE @ObjFileSystem INT --File System Object

--===== These variable names match the sp_OAGetProperty options

-- Made names match so they're less confusing

DECLARE @Attributes INT --Read only, Hidden, Archived, etc

DECLARE @DateCreated DATETIME --Date file was created

DECLARE @DateLastAccessed DATETIME --Date file was last read

DECLARE @DateLastModified DATETIME --Date file was last written to

DECLARE @Name VARCHAR(128) --File Name and Extension

DECLARE @Path VARCHAR(128) --Full path including file name

DECLARE @ShortName VARCHAR(12) --8.3 file name

DECLARE @ShortPath VARCHAR(100) --8.3 full path including file name

DECLARE @Size INT --File size in bytes

DECLARE @Type VARCHAR(100) --Long Windows file type (eg.'Text Document',etc)

--=======================================================================================

-- Create temporary working tables

--=======================================================================================

--===== Create a place to store all file names derived from xp_DirTree

IF OBJECT_ID('TempDB..#DirTree','U') IS NOT NULL

DROP TABLE #DirTree

CREATE TABLE #DirTree

(

RowNum INT IDENTITY(1,1),

Name VARCHAR(256) PRIMARY KEY CLUSTERED,

Depth BIT,

IsFile BIT

)

--===== Create a place to store the file details

IF OBJECT_ID('TempDB..#FileDetails','U') IS NOT NULL

DROP TABLE #FileDetails

CREATE TABLE #FileDetails

(

RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

Name VARCHAR(128), --File Name and Extension

Path VARCHAR(128), --Full path including file name

ShortName VARCHAR(12), --8.3 file name

ShortPath VARCHAR(100), --8.3 full path including file name

DateCreated DATETIME, --Date file was created

DateLastAccessed DATETIME, --Date file was last read

DateLastModified DATETIME, --Date file was last written to

Attributes INT, --Read only, Hidden, Archived, etc

Size INT, --File size in bytes

Type VARCHAR(100) --Long Windows file type (eg.'Text Document',etc)

)

--=======================================================================================

-- Get all the file names for the directory (includes directory names as IsFile = 0)

--=======================================================================================

--===== Get the file names for the desired path

INSERT INTO #DirTree (Name, Depth, IsFile)

EXEC Master.dbo.xp_Dirtree @piFullPath,1,1 -- Current diretory only, list file names

-- Remember the row count

SET @DirTreeCount = @@ROWCOUNT

--===== Update the file names with the path for ease of processing later on

UPDATE #DirTree

SET Name = @piFullPath + Name

select * from #dirtree

--=======================================================================================

--

--=======================================================================================

--===== Create a file system object and remember the "handle"

EXEC @HR = dbo.sp_OACreate 'Scripting.FileSystemObject', @ObjFileSystem OUT

SET @Counter = 1

WHILE @Counter <= @DirTreeCount

BEGIN

--===== Get the current name and see if it's a file

SELECT @CurrentName = Name,

@IsFile = IsFile

FROM #DirTree

WHERE RowNum = @Counter

--===== If it's a file, get the details for it

IF @IsFile = 1 AND @CurrentName LIKE '%%'

BEGIN

--===== Create an object for the path/file and remember the "handle"

EXEC dbo.sp_OAMethod @ObjFileSystem,'GetFile', @ObjFile OUT, @CurrentName

--===== Get the all the required attributes for the file itself

EXEC dbo.sp_OAGetProperty @ObjFile, 'Path', @Path OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'ShortPath', @ShortPath OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'Name', @Name OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'ShortName', @ShortName OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'DateCreated', @DateCreated OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'DateLastAccessed', @DateLastAccessed OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'DateLastModified', @DateLastModified OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'Attributes', @Attributes OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'Size', @Size OUT

EXEC dbo.sp_OAGetProperty @ObjFile, 'Type', @Type OUT

INSERT INTO #FileDetails

(Path, ShortPath, Name, ShortName, DateCreated, DateLastAccessed, DateLastModified, Attributes, Size, Type)

SELECT @Path,@ShortPath,@Name,@ShortName,@DateCreated,@DateLastAccessed,@DateLastModified,@Attributes,@Size,@Type

END

SELECT @Counter = @Counter + 1

END

EXEC sp_OADestroy @ObjFileSystem

EXEC sp_OADestroy @ObjFile

SELECT * FROM #FileDetails

Major kudos to Robyn and Phil for posting the basis of that code on Simple-Talk.

I used an undocumented extended stored procedure because it was easier than settingh up the loop to find the file names in a directory (Oh my! I have sinned! Twice! ;)) Lemme ask you... who amongst you hasn't used an undocumented stored procedure? Who amongst you has a DBA that hasn't? Who amongst you think that undocumented stored procedures are more subject to change than fully documented ones? The required privs on very heavily document sp_MakeWebTask changed from "dbo" in 2k sp3a to "SA" in 2k sp4 with no warning of the change until the change actually came out. Sure, there was some warning... but that doesn't change the fact that a lot of code suddenly broke and that simple proxy tricks needed to be incorporated to fix the now broken code. And, to top it all off, xp_Dirtree is available and working fine in 2k, 2k5, and 2k8. If you really don't like it, take the time to get the file names using OLE automation, instead.

Just a little more and I'll get off the ol' soap box... the posts no longer seem to be available after the new forum move that SSC made a while back, but prior to that, a couple of folks (Matt Miller and Sergiy in particular) took on all comers. We even had contests between ourselves. The ONLY case where CLR's even came close to beating T-SQL was with RegEx... the idea of checking email addresses certainly qualifies as a good need for RegEx... but, ya gotta ask yourself... if you can check 10,000 email addresses in 356 milliseconds in T-SQL, is it worth going the CLR route? How about checking a million in "[font="Courier New"]21813 ms without a UDF[/font]" without a CLR?

Don't get taken in with all the false glitter surrounding CLR's and their usage. If you're going to write code against databases, spend some time learning how to do it right. With the exception of RegEx, damned near all CLR's are simply not the right way to do it. And, consider this... isn't it a bit of an oxymoron to let someone who can't write database code use another tool to try to do it? Get a real database developer, instead. :)

By the way, in SQL Server 2005, my production code for the DIRectory code [font="Arial Black"]doesn't [/font]have a While loop in it... send me a case of beer and I'll show you how.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 5

Ok, what did I miss?? I don't quite know what I missed? If it was the exception from providing a bad path... I didn't miss that, like I said to Phil, I wouldn't provide a bad path in code that uses this, so it wasn't necessarily missed as much as it wasn't important to me.

Now what I find really funny is you preach about RBAR and non-looping methods, and your alternative to a CLR solution here is to loop with a OLE Automation call? You might compare memory consumption between your solution and the CLR in this article before you posit that it is a a better solution. Either way, you have to enable a disabled by default operation on the SQL Server. You can leak memory with OLE automation much faster than with SQLCLR, and to the point that it can cause server instability, don't believe me, just forget a sp_OADestroy call.

The choice for CLR doesn't have to always be about performance. I am sure that you would argue a 200ms difference over 100000 rows in speed between a CLR function and your TSQL function is worthy, and proves CLR is not as efficient. You leave the company and another developer has to touch that function and it takes them hours/days to figure it out, edit and test it, did you save something? If you are talkin ms difference and it simplifies code sufficiently to save cost overall in application management then CLR definately provides added benefit to cover the difference. Precious few would even notice the difference when you are in the ms ranges over large datasets.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 6

Phil Factor (2/5/2009)
Oooh. This is getting very useful. I'd missed out on the advantages of passing string parameters as SQLStrings and the trick of passing back a NULL. It makes perfect sense. Now it has the behaviour that one would want.

Phil,

We can actually go a bit further with this if we want and use a custom struct to build a list off of and still get the Directory not Found result row in the set. Try the attached install script. To see the code for the struct and the modified output, use the select from sys.assembly_files listed previously in this thread.

Jeff Moden SSC Guru Points: 1004621 More actions · Answer 7

Jonathan Kehayias (2/6/2009)
Ok, what did I miss?? I don't quite know what I missed? If it was the exception from providing a bad path... I didn't miss that, like I said to Phil, I wouldn't provide a bad path in code that uses this, so it wasn't necessarily missed as much as it wasn't important to me.

That was it and I agree... someone like you wouldn't pass a bad path.

Now what I find really funny is you preach about RBAR and non-looping methods, and your alternative to a CLR solution here is to loop with a OLE Automation call? You might compare memory consumption between your solution and the CLR in this article before you posit that it is a a better solution. Either way, you have to enable a disabled by default operation on the SQL Server. You can leak memory with OLE automation much faster than with SQLCLR, and to the point that it can cause server instability, don't believe me, just forget a sp_OADestroy call.

Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.

The choice for CLR doesn't have to always be about performance. I am sure that you would argue a 200ms difference over 100000 rows in speed between a CLR function and your TSQL function is worthy, and proves CLR is not as efficient. You leave the company and another developer has to touch that function and it takes them hours/days to figure it out, edit and test it, did you save something?

Heh... you mean yet another Developer who can't read simple T-SQL? Look at the code again... even a complete newbie can figure it out in a couple of minutes if they use Books Online. And, if they can't, they really shouldn't be messing with a database by any means.

If you are talkin ms difference and it simplifies code sufficiently to save cost overall in application management then CLR definately provides added benefit to cover the difference. Precious few would even notice the difference when you are in the ms ranges over large datasets.

So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR... here's a reminder...

/*

USE [master]

GO

DROP FUNCTION [dbo].[os_directory_info]

DROP ASSEMBLY SQLCLRNet_DirectoryBrowser

DROP USER SQLCLRNet_ExampleLogin

DROP LOGIN SQLCLRNet_ExampleLogin

DROP ASYMMETRIC KEY SQLCLRNet_ExampleKey

GO

*/

USE [master]

GO

/****** Object: SqlAssembly [SQLCLRNet_DirectoryBrowser] Script Date: 01/23/2009 22:19:49 ******/

IF EXISTS (SELECT * FROM sys.assemblies asms WHERE asms.name = N'SQLCLRNet_DirectoryBrowser')

DROP ASSEMBLY [SQLCLRNet_DirectoryBrowser]

GO

/****** Object: SqlAssembly [SQLCLRNet_DirectoryBrowser] Script Date: 01/23/2009 22:19:49 ******/

CREATE ASSEMBLY [SQLCLRNet_DirectoryBrowser]

AUTHORIZATION [dbo]

FROM 0x ... really big binary number here...

WITH PERMISSION_SET = SAFE

GO

-- Create the Asymmetric Key from the Assembly

CREATE ASYMMETRIC KEY SQLCLRNet_ExampleKey

FROM ASSEMBLY [SQLCLRNet_DirectoryBrowser]

GO

-- Create the Login from the Asymmetric Key

CREATE LOGIN SQLCLRNet_ExampleLogin

FROM ASYMMETRIC KEY SQLCLRNet_ExampleKey

GO

-- Grant the External Access Priviledge to the Login

GRANT EXTERNAL ACCESS ASSEMBLY TO SQLCLRNet_ExampleLogin

GO

-- Create the database user for Authorization on the Assembly

CREATE USER SQLCLRNet_ExampleLogin FOR LOGIN SQLCLRNet_ExampleLogin

GO

-- Set Authorization to the Database User

ALTER AUTHORIZATION ON ASSEMBLY::[SQLCLRNet_DirectoryBrowser] TO SQLCLRNet_ExampleLogin

GO

-- Set the Assembly for External Access

ALTER ASSEMBLY [SQLCLRNet_DirectoryBrowser] WITH PERMISSION_SET = EXTERNAL_ACCESS

GO

-- Create the TSQL Function that maps to the Assembly

CREATE FUNCTION [dbo].[os_directory_info](@path [nvarchar](max), @filter [nvarchar](100) = null)

RETURNS TABLE (

[name] [nvarchar](max) NULL,

[is_directory] [bit] NULL,

[size_in_bytes] [bigint] NULL,

[create_date] [datetime] NULL,

[last_written_to] [datetime] NULL,

[last_accessed] [datetime] NULL,

[attributes] [nvarchar](max) NULL

) WITH EXECUTE AS CALLER

AS

EXTERNAL NAME [SQLCLRNet_DirectoryBrowser].[UserDefinedFunctions].[os_directory_info]

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 8

Jeff Moden (2/6/2009)
When I run that bit of computational heaven against the test data I built above, here's the results I get...
1263 ms for Looping UDF
356 ms for my UDF
233 ms without a UDF
Like I said... I don't know where the hell the author of that article get's off saying that his looping UDF takes over 14 seconds to run, but he was wrong even in that!!! :crazy:
Add the overhead of CLR's to that, and ya just gotta ask yourself, is it all worth it?
Somebody, please! Test the CLR and the code I just posted all on one machine! Let's see what really gives!

I didn't use his CLR method directly because it has an inherent flaw in its usage of Regex in it that I wouldn't make in my own code. Instead I used a functionally identical function:

using System;

using System.Data;

using System.Data.SqlClient;

using System.Data.SqlTypes;

using Microsoft.SqlServer.Server;

using System.Text.RegularExpressions;

public partial class UserDefinedFunctions

{

private static readonly Regex emailRegex = new Regex(@"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");

[Microsoft.SqlServer.Server.SqlFunction]

public static SqlBoolean EmailRegex(SqlString email)

{

// Put your code here

return new SqlBoolean(emailRegex.IsMatch(email.Value));

}

};

Using the static readonly member means the Regex object is built once and reused which is a bit more efficient memory wise.

The results over 5 test runs using Jeff's code versus the above on my Dell Dual Core 2.0 Ghz Laptop with 4GB RAM:

533 ms for Looping UDF

186 ms for my UDF

123 ms without a UDF

63 ms for CLR UDF

I must have a better computer than Jeff does, but all numbers are roughly 1/2 of his projected times so it should show the numbers are all fair.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 9

Jeff Moden (2/6/2009)
Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.

Not true. I've seen this happen in consulting twice for OLE automation and a few times for XML documents that get created but not removed. Its not that difficult of a mistake to make. Leaking in CLR is only possible with UNSAFE assemblies, as a part of the design of the hosting environment. The ability to leak memory requires Host Protection Attributes that are prevented from loading in anything but UNSAFE.

So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR.

You could do everyone a favor and just reference them to the text file rather than posting the varbinary output string of a compiled assembly in here and making this page of the forums unusable because it doesn't know how to handle a string that long. You aren't comparing similar code, by looking at the binary compiled assembly object. The code is the C# and should the following connect item get added to SQL Server:

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=265266

the ability to write code similar to the following will exist in SQL:

CREATE PROCEDURE my_proc @par1 int,

@par2 int WITH

EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS

using System;

using System.Data;

using System.Data.SqlClient;

...

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 10

sean hawkes (2/6/2009)
To me the SQLCLR vs. xp_cmdshell seems like the old script vs. small application debate. Seems that SQLCLR is very similar to writing a small application. I mean you're writing code, building an assembly, mapping it to the sql server. Seems like a lot of overhead to get a directory listing into a table. Granted I do like the idea of passing table variables and the such are nice and I can see for operations that are impossible within TSQL the need. But, I can see how doing all of this is like hitting a finishing nail with a sledge hammer. Again, to me it comes down to what am I really trying to do, and how much time I have to do that task. Lets face it there are situations where time is of the essence and you don't have time to create a new assembly and "hook" it to your sql server.
My other question is how does this differer from creating a dll that is mapped to an extended stored procedure?
Sean

This solution is based on the premise that security is important enough to take the time to implement. If time is more important than preventing unrestricted access to the OS as the SQL Service account with xp_cmdshell, then this solution probably isn't for you.

In the words of Mr Miyagi, "Different, but same". This is different from an extended stored procedure because it is in managed code which is more commonly known, easier to develop, and doesn't require the level of knowledge that C++ unmanaged extended procedures did. It also doesn't have the risks of memory leaks that the unmanage code did given the CAS level that it is created with. It is the same because you are using an external language to write the code, but that is about the only similarity that exists between the two.

Jeff Moden SSC Guru Points: 1004621 More actions · Answer 11

Jonathan Kehayias (2/6/2009)
I didn't use his CLR method directly because it has an inherent flaw in its usage of Regex in it
that I wouldn't make in my own code. Instead I used a functionally identical function:

Outstanding! Thanks for running the tests! Like I said, the one place that Matt Miller, Sergiy,

and I proved that CLR's were definitely better for performance was in the area of RegEx and I

was also absolutely sure that you could make a major improvement there, as well. Well done!

And, I'm not being a bit ironic there... I mean it.

Because of the crud you have to go through with OLE automation and the slowness of the

associated sp_OA* routines, I'd be willing to bet that a CLR would also beat the code I

posted for performance. We didn't test that in any of our deep dives, but I'm sure it's true and

a good CLR for that would be great.

But, like you said, performance isn't the only thing to consider when making the decision to use

a CLR. Look at the installation code you had to write. Can a "newbie" actually do that as you

suggested? I submit, probably not. And then, there are those hardcore DBA's that say "Not on

my watch" when it comes to CLR's. Better know how to do several different ways... something

a "newbie" wouldn't actually know, anyway.

As you've just shown, although there's a 50% difference in performance between that given CLR

and the fastest code I wrote in T-SQL to do the same thing but, consider for a moment that

it's only 60 milliseconds difference... across a million rows and extrapolating your run times on 10k

rows vs a million rows, thats only 6 seconds difference... and I have two lines in a WHERE clause...

with the CLR, there's source code to check in, a DLL to test and install, and an installation procedure

to make the code useful as a CLR function. Lord help us if we migrate servers.

Just to be clear, I'm not saying that all CLR's should be avoided. You said it quite correctly

when you said that some folks abuse them and you picked a very good example for what a CLR

should probably be used for. In fact, I have no problem with your good article at all. One of my major

points was that someone posted some a link to some pretty bad T-SQL and the race between that

and a CLR and a test that probably wasn't even done correctly. CLR's get glamourized because of

such mistakes much like set based code gets a bad name because someone writes some really bad

set based code to compare to. I want to make sure that, because of that miserable article that someone

referred us all to, that people don't go CLR crazy like they did when cursors were introduced in early

versions (6.5 I believe) of SQL Server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004621 More actions · Answer 12

Jonathan Kehayias (2/6/2009)
Jeff Moden (2/6/2009)
Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.
Not true. I've seen this happen in consulting twice for OLE automation and a few times for XML documents that get created but not removed. Its not that difficult of a mistake to make. Leaking in CLR is only possible with UNSAFE assemblies, as a part of the design of the hosting environment. The ability to leak memory requires Host Protection Attributes that are prevented from loading in anything but UNSAFE.
So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR.
You could do everyone a favor and just reference them to the text file rather than posting the varbinary output string of a compiled assembly in here and making this page of the forums unusable because it doesn't know how to handle a string that long. You aren't comparing similar code, by looking at the binary compiled assembly object. The code is the C# and should the following connect item get added to SQL Server:
http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=265266
the ability to write code similar to the following will exist in SQL:
CREATE PROCEDURE my_proc @par1 int,
@par2 int WITH
EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS
using System;
using System.Data;
using System.Data.SqlClient;
...

That's what I mean... what's the difference between someone forgetting to do an sp_OADestroy and someone forgetting to mark a CLR as something other than UNSAFE especially if they're a newbie? Even if it gives you a warning, is a newbie gonna know what to do about it?

But, point well taken, hopefully on both parts... you do have a caveat for both methods.

Ya beat me to it on the other thing... I didn't realize the binary was quite so wide in your code and started fixing it as you were typing. It's fixed now. Thanks for the heads up, though.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 13

Its taken me close to a year to create a series of articles that have sufficient utility to be worthwhile, and are logically good use of SQLCLR so as to not be chewed alive by the community. Replacing less secure methodologies is my favorite place to make use of SQLCLR, though generally speaking, I wouldn't be reading directories from my TSQL code either, so I don't really have these examples in use in my own environment.

The EmailRegex I do have in use in a data warehouse that I built that loads data from all different kinds of sources, so it is very useful in the data cleansing process post load. I didn't do that in CLR for performance, more than I already had the code from answering a forums post, and it was easily reused. The difference in performance between the TSQL and the CLR isn't enough to make it matter to me. Like I said, if you are talking ms what did you really save and do you really notice it?

As for the installation script I provided, could anybody do that. Sure, if they were willing to research the "best practice" implementation for a EXTERNAL_ACCESS Assembly. Most aren't willing to go that route and instead just flip TRUSTWORTHY ON and move on with deploying under dbo. In fact, I think I have one of the few articles on the web that actually uses Keys over Trustworthy for non-SAFE assemblies. Since my purpose was to demonstrate how to be more "secure" with SQLCLR, setting TRUSTWORTHY wouldn't have really been good to do, especially in master, so I did it the right way, rather than the easy way. The installer script here does provide a simple framework that can be used for any assembly. Just a little copy paste rename kind of stuff to do.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 14

Jeff Moden (2/6/2009)
Ya beat me to it on the other thing... I didn't realize the binary was quite so wide in your code and started fixing it as you were typing. It's fixed now. Thanks for the heads up, though.

I wish there was a better way to provide that binary string in forums. At least here you can attach files to forums posts as text. Other places it is completely impossible to send someone a sample CLR compiled assembly for even a simple Hello World example. It is equally as long as the assembly I posted for reading the directory information.

Jeff Moden SSC Guru Points: 1004621 More actions · Answer 15

Jonathan Kehayias (2/6/2009)
so I did it the right way, rather than the easy way

BWAA-HAA!!! Man, I'm right there with you on that one! Now, all we have to do is convince the rest of the world, especially the managers that want it real bad, to give people just a little more time so they don't get it that way... real bad.

Good talking with you again, Jonathan. Kudos on the article.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Trading in xp_cmdshell for SQLCLR (Part 1) - List Directory Contents

Cookies on SQLServerCentral