Trading in xp_cmdshell for SQLCLR (Part 1) - List Directory Contents

  • Actually .NET has very robust exception handling abilities, and if you implement them along with TSQL exception handling there shouldn't be problems.

    Right, a missing "catch" in an assembly is a pain!

    Describe a stability problem that you have encountered in a SAFE/EXTERNAL_ACCESS assembly implementation? If you go UNSAFE all bets are off, but SAFE/EXTERNAL_ACCESS cannot cause you stability issues.

    You also have to run this kind of code with UNSAFE access, so it really is all about how you control what you put into your SQL Server.

    I am not talking about UNSAFE. Not all developers have solid programming skills to write something that "scales". Putting bad code on the server is not a good thing. I do understand that "with power comes responsibility" but is a lot harder to do it right in a busy server.

    Personally I had to deal with (mis)use cases related with large memory allocations.

    SEE: http://blogs.msdn.com/sqlclrperf/archive/2007/06/01/scalable-memory-usage-in-sqlclr.aspx

    It is NOT pretty to get GC running and you have no control or very little over it 🙁

    Describe a problem caused by a SAFE/EXTERNAL_ACCESS assembly in SQL Server and how it was harder to deal with.

    It is harder for example to replace an under-performing assembly than an stored procedure across multiple servers for example.


    * Noel

  • noeld (2/5/2009)


    I am not talking about UNSAFE. Not all developers have solid programming skills to write something that "scales". Putting bad code on the server is not a good thing. I do understand that "with power comes responsibility" but is a lot harder to do it right in a busy server.

    Personally I had to deal with (mis)use cases related with large memory allocations.

    SEE: http://blogs.msdn.com/sqlclrperf/archive/2007/06/01/scalable-memory-usage-in-sqlclr.aspx

    It is NOT pretty to get GC running and you have no control or very little over it 🙁

    You get the same performance problems in SQLCLR as you could in TSQL from a developer not having the knowledge on how to efficiently do things. You might spot the TSQL problem in code earlier, but only because you have more experience with TSQL.

    If your objects implement IDisposible, then you should be coding with "using" blocks, which trigger GC when they fall out of scope. This is one of the recommended practices with coding in SQLCLR. Also whereever the same object is consistently declared the same way it is best to evaluate whether it could be better used as a static readonly object, where it is only created once by the system and can be reused consistently. For example common Regular Expression objects.

    If you use expensive objects like datasets in SQLCLR, replace them with DataReaders and work itteratively, which is not bad for performance in CLR like it is in TSQL.

    It is harder for example to replace an under-performing assembly than an stored procedure across multiple servers for example.

    How so? If you rely on Visual Studio's deploy processes to deploy your assemblies, then stop and learn how to use manual scripting which will allow you to ALTER ASSEMBLY just like you ALTER PROCEDURE. Changing an assembly out is no more difficult than changing a stored procedure out if you know the DDL required to perform the operations.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • To me the SQLCLR vs. xp_cmdshell seems like the old script vs. small application debate. Seems that SQLCLR is very similar to writing a small application. I mean you're writing code, building an assembly, mapping it to the sql server. Seems like a lot of overhead to get a directory listing into a table. Granted I do like the idea of passing table variables and the such are nice and I can see for operations that are impossible within TSQL the need. But, I can see how doing all of this is like hitting a finishing nail with a sledge hammer. Again, to me it comes down to what am I really trying to do, and how much time I have to do that task. Lets face it there are situations where time is of the essence and you don't have time to create a new assembly and "hook" it to your sql server.

    My other question is how does this differer from creating a dll that is mapped to an extended stored procedure?

    Sean

  • Don, here is a VERY good example on why CLR's sometimes ARE the way to go:

    SQL Server CLR function to improve performance of validating email addresses

    Lordy, here we go again... someone takes a really horrible example of some horribly written SQL and supposedly tests it against a CLR and supposedly comes up with some wonderful execution time and, as near as I can tell, none of you have actually tested it.

    I've said it before and I'll say it again, CLR's are for people who don't know how to write T-SQL. Just look at the example code in that link...

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION [dbo].[udf_Txt_IsEmail] (

    @EmailAddr varchar(255) -- Email address to check

    ) RETURNS BIT -- 1 if @EmailAddr is a valid email address

    /*

    * Checks an text string to be sure it's a valid e-mail address.

    * Returns 1 when it is, otherwise 0.

    * Example:

    SELECT CASE WHEN 1=dbo.udf_Txt_IsEmail('anovick@NovickSoftware.com')

    THEN 'Is an e-mail address' ELSE 'Not an e-mail address' END

    *

    * Test:

    print case when 1=dbo.udf_txt_isEmail('anovick@novicksoftware.com')

    then 'Passes' else 'Fails' end + ' test for good addr'

    print case when 0=dbo.udf_txt_isEmail('@novicksoftware.com')

    then 'Passes' else 'Fails' end + ' test for no user'

    print case when 0=dbo.udf_txt_isEmail('anovick@n.com')

    then 'Passes' else 'Fails' end + ' test for 1 char domain'

    print case when 1=dbo.udf_txt_isEmail('anovick@no.com')

    then 'Passes' else 'Fails' end + ' test for 2 char domain'

    print case when 0=dbo.udf_txt_isEmail('anovick@.com')

    then 'Passes' else 'Fails' end + ' test for no domain'

    print case when 0=dbo.udf_txt_isEmail('anov ick@novicksoftware.com')

    then 'Passes' else 'Fails' end + ' test for space in name'

    print case when 0=dbo.udf_txt_isEmail('ano#vick@novicksoftware.com')

    then 'Passes' else 'Fails' end + ' test for # in user'

    print case when 0=dbo.udf_txt_isEmail('anovick@novick*software.com')

    then 'Passes' else 'Fails' end + ' test for * asterisk in domain'

    ****************************************************************/

    AS BEGIN

    DECLARE @AlphabetPlus VARCHAR(255)

    , @max-2 INT -- Length of the address

    , @Pos INT -- Position in @EmailAddr

    , @OK BIT -- Is @EmailAddr OK

    -- Check basic conditions

    IF @EmailAddr IS NULL

    OR NOT @EmailAddr LIKE '_%@__%.__%'

    OR CHARINDEX(' ',LTRIM(RTRIM(@EmailAddr))) > 0

    RETURN(0)

    SELECT @AlphabetPlus = 'abcdefghijklmnopqrstuvwxyz01234567890_-.@'

    , @max-2 = LEN(@EmailAddr)

    , @Pos = 0

    , @OK = 1

    WHILE @Pos < @max-2 AND @OK = 1 BEGIN

    SET @Pos = @Pos + 1

    IF NOT @AlphabetPlus LIKE '%'

    + SUBSTRING(@EmailAddr, @Pos, 1)

    + '%'

    SET @OK = 0

    END -- WHILE

    RETURN @OK

    END

    go

    GRANT EXEC on dbo.udf_txt_isEmail to PUBLIC

    go

    It's not the most efficient of code. That's why I have to update it.

    Even the author of the code admits it's not optimal. In fact, it's RBAR on steroids.

    But, even at that, on a 10,000 row test to find all bad emails, it certainly does NOT take over 14 seconds to run. I'll prove THAT in a minute on my poor ol' 7 year old, single 1.8 Ghz CPU with only a Gig of RAM.

    Second of all, this particular check can be done so easily that it's actually an insult to common sense and the resources of the machine to use a UDF. Yeah, yeah... I know... "Convenient encapsulation for developers". What you really mean is "Convenient encapsulation for developers that don't have a clue" and that's why people use some UDF's and it's certainly why they resort to a bloody CLR.

    I can't test the CLR from that horrible article, but I'll make it easy for someone who can to compare the actual code all at once...

    First, we need 10,000 rows of test data... that's easy... most of the rows in the following are "good" and there's some broken email addresses... read the comments in the code that follows...

    SET NOCOUNT ON

    DROP TABLE JBMTest

    GO

    --===== Create and populate a 1,000,000 row test table.

    -- Column "RowNum" has a range of 1 to 1,000,000 unique numbers

    -- Column "EMailAddr" has good email addresses. We'll break some later.

    -- Jeff Moden

    SELECT TOP 10000

    RowNum = IDENTITY(INT,1,1),

    EMailAddr = CAST('simpletest@someemail.com' AS VARCHAR(255)),

    Status = CAST('Good' AS VARCHAR(50))

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns t1

    CROSS JOIN Master.dbo.SysColumns t2

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    --===== Break some known email addresses

    UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novicksoftware.com', Status = 'Good' WHERE RowNum = 1000

    UPDATE dbo.JBMTest SET EMailAddr = '@novicksoftware.com', Status = 'Bad No User' WHERE RowNum = 2000

    UPDATE dbo.JBMTest SET EMailAddr = 'anovick@n.com', Status = 'Bad 1 Char Domain' WHERE RowNum = 3000

    UPDATE dbo.JBMTest SET EMailAddr = 'anovick@no.com', Status = 'Good 2 Char Domain' WHERE RowNum = 4000

    UPDATE dbo.JBMTest SET EMailAddr = 'anovick@.com', Status = 'Bad No Domain' WHERE RowNum = 5000

    UPDATE dbo.JBMTest SET EMailAddr = 'anov ick@novicksoftware.com',Status = 'Bad Space in Name ' WHERE RowNum = 6000

    UPDATE dbo.JBMTest SET EMailAddr = 'ano#vick@novicksoftware.com',Status = 'Bad # in Name' WHERE RowNum = 7000

    UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novick*software.com',Status = 'Bad * in Name' WHERE RowNum = 8000

    UPDATE dbo.JBMTest SET EMailAddr = 'anovick@novick_software.com',Status = 'Good _ in Name' WHERE RowNum = 9000

    GO

    Now, let's re-write the function with some common sense... something that understands the power of NULLs and something that, in and of itself, has no explicit RBAR...

    CREATE FUNCTION dbo.udf_Txt_IsEmail_Improved

    -- Written by Jeff Moden

    (

    @EmailAddr VARCHAR(255)

    )

    RETURNS BIT

    AS

    BEGIN

    --===== Declare obviously name variables

    DECLARE @OK BIT

    SET @OK = 0

    --===== Check the address for form and content

    SELECT @OK = 1

    WHERE @EmailAddr LIKE '_%@__%.__%'

    AND @EmailAddr NOT LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'

    RETURN @OK

    END

    GO

    GRANT EXEC ON dbo.udf_txt_isEmail_Improved to PUBLIC

    GO

    Let's go one step further, eh? Let's just do it right because this code is SO simple, that you don't even need a UDF for this...

    --===== My test without a UDF because THE CODE IS THAT SIMPLE!!!!!!!

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    SELECT *

    FROM dbo.JBMTest

    WHERE EmailAddr NOT LIKE '_%@__%.__%'

    OR EmailAddr LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'

    PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms without a UDF'

    GO

    Alright... we've got test data... we have the function from the article... we have my function... and we have some nice simple set-based code that blows all of the above away...

    --===== The looping UDF test

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    SELECT *

    FROM dbo.JBMTest

    WHERE dbo.udf_txt_isEmail(EMailAddr) = 0

    PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms for Looping UDF'

    GO

    --===== My set-based UDF test

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    SELECT *

    FROM dbo.JBMTest

    WHERE dbo.udf_txt_isEmail_Improved(EMailAddr) = 0

    PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms for my UDF'

    GO

    --===== My test without a UDF because THE CODE IS THAT SIMPLE!!!!!!!

    DECLARE @StartTime DATETIME

    SET @StartTime = GETDATE()

    SELECT *

    FROM dbo.JBMTest

    WHERE EmailAddr NOT LIKE '_%@__%.__%'

    OR EmailAddr LIKE '%[^-.@abcdefghijklmnopqrstuvwxyz01234567890__]%' ESCAPE '_'

    PRINT STR(DATEDIFF(ms,@StartTime,GETDATE())) + ' ms without a UDF'

    GO

    When I run that bit of computational heaven against the test data I built above, here's the results I get...

    1263 ms for Looping UDF

    356 ms for my UDF

    233 ms without a UDF

    Like I said... I don't know where the hell the author of that article get's off saying that his looping UDF takes over 14 seconds to run, but he was wrong even in that!!! :crazy:

    I'll also plug set-based code here a minute. Lots of folks keep saying they don't have the time to figure out set based code and/or that it makes for much longer, more complicated code. I believe that, especially in this case, we just proved that's a huge load hooie 😛 The set based code is both easier to read and understand than the While loop UDF and it's also a heck of a lot shorter. In fact, it's so short and easy to do that anyone who puts it in a UDF needs to be shot in the head with a volley of high velocity, bone-in, frozen pork chops! 😀

    Ok... now let's just say that the CLR that was included in the article actually comes in at 53 milliseconds like the article claims. First of all, I don't trust anything in that article because the time for the UDF was so very far off that I don't trust the authors methods of testing. Second, look at all the crap that can happen with a CLR. What crap? Look back at this thread! That's what! Look at all the things that can go wrong! Even Jonathan, as much of an expert on the subject he is, missed something. Add the overhead of CLR's to that, and ya just gotta ask yourself, is it all worth it?

    Somebody, please! Test the CLR and the code I just posted all on one machine! Let's see what really gives!

    Now, about this directory stuff and using xp_CmdShell and parsing directory listings that could change... again, it all sounds like a good idea, but it's not. Someone already stated that the format of the DOS DIRectory listing could change and that would break the code, so I'm thinkin' that's a pretty bad idea to try and parse it. And, it's for good reason that a lot of DBA's don't want xp_CmdShell to be used. But, what harm is there if the server is an ETL computer with no public facing interfaces?

    But, let's just assume the DBA stands firm (and they have the right to... they've been charged with protecting the data and the environment it's in!). Will the DBA allow the use of OLE automation? Even if not, if you explain that you can move a directory listing into a table with OLE automation instead of a CLR, I'll just be that (s)he will get all cooperative really sudden like. Something like this should do...

    -- CREATE PROCEDURE dbo.GetFileDetails

    -- --===== Declare the I/O parameters

    -- @piFullPath VARCHAR(128)

    -- AS

    declare @piFullPath VARCHAR(128)

    set @piFullPath = 'c:\temp\'

    SET NOCOUNT ON

    --=======================================================================================

    -- Local variables

    --=======================================================================================

    --===== These are processing control and reporting variables

    DECLARE @Counter INT --General purpose counter

    DECLARE @CurrentName VARCHAR(256) --Name of file currently being worked

    DECLARE @DirTreeCount INT --Remembers number of rows for xp_DirTree

    DECLARE @ErrorMessage VARCHAR(255) --the potential error message

    DECLARE @ErrorObject INT --the error object

    DECLARE @HR INT --the HRESULT returned from

    DECLARE @IsFile BIT --1 if Name is a file, 0 if not

    --===== The are object "handle" variables

    DECLARE @ObjFile INT --File object

    DECLARE @ObjFileSystem INT --File System Object

    --===== These variable names match the sp_OAGetProperty options

    -- Made names match so they're less confusing

    DECLARE @Attributes INT --Read only, Hidden, Archived, etc

    DECLARE @DateCreated DATETIME --Date file was created

    DECLARE @DateLastAccessed DATETIME --Date file was last read

    DECLARE @DateLastModified DATETIME --Date file was last written to

    DECLARE @Name VARCHAR(128) --File Name and Extension

    DECLARE @Path VARCHAR(128) --Full path including file name

    DECLARE @ShortName VARCHAR(12) --8.3 file name

    DECLARE @ShortPath VARCHAR(100) --8.3 full path including file name

    DECLARE @Size INT --File size in bytes

    DECLARE @Type VARCHAR(100) --Long Windows file type (eg.'Text Document',etc)

    --=======================================================================================

    -- Create temporary working tables

    --=======================================================================================

    --===== Create a place to store all file names derived from xp_DirTree

    IF OBJECT_ID('TempDB..#DirTree','U') IS NOT NULL

    DROP TABLE #DirTree

    CREATE TABLE #DirTree

    (

    RowNum INT IDENTITY(1,1),

    Name VARCHAR(256) PRIMARY KEY CLUSTERED,

    Depth BIT,

    IsFile BIT

    )

    --===== Create a place to store the file details

    IF OBJECT_ID('TempDB..#FileDetails','U') IS NOT NULL

    DROP TABLE #FileDetails

    CREATE TABLE #FileDetails

    (

    RowNum INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

    Name VARCHAR(128), --File Name and Extension

    Path VARCHAR(128), --Full path including file name

    ShortName VARCHAR(12), --8.3 file name

    ShortPath VARCHAR(100), --8.3 full path including file name

    DateCreated DATETIME, --Date file was created

    DateLastAccessed DATETIME, --Date file was last read

    DateLastModified DATETIME, --Date file was last written to

    Attributes INT, --Read only, Hidden, Archived, etc

    Size INT, --File size in bytes

    Type VARCHAR(100) --Long Windows file type (eg.'Text Document',etc)

    )

    --=======================================================================================

    -- Get all the file names for the directory (includes directory names as IsFile = 0)

    --=======================================================================================

    --===== Get the file names for the desired path

    INSERT INTO #DirTree (Name, Depth, IsFile)

    EXEC Master.dbo.xp_Dirtree @piFullPath,1,1 -- Current diretory only, list file names

    -- Remember the row count

    SET @DirTreeCount = @@ROWCOUNT

    --===== Update the file names with the path for ease of processing later on

    UPDATE #DirTree

    SET Name = @piFullPath + Name

    select * from #dirtree

    --=======================================================================================

    --

    --=======================================================================================

    --===== Create a file system object and remember the "handle"

    EXEC @HR = dbo.sp_OACreate 'Scripting.FileSystemObject', @ObjFileSystem OUT

    SET @Counter = 1

    WHILE @Counter <= @DirTreeCount

    BEGIN

    --===== Get the current name and see if it's a file

    SELECT @CurrentName = Name,

    @IsFile = IsFile

    FROM #DirTree

    WHERE RowNum = @Counter

    --===== If it's a file, get the details for it

    IF @IsFile = 1 AND @CurrentName LIKE '%%'

    BEGIN

    --===== Create an object for the path/file and remember the "handle"

    EXEC dbo.sp_OAMethod @ObjFileSystem,'GetFile', @ObjFile OUT, @CurrentName

    --===== Get the all the required attributes for the file itself

    EXEC dbo.sp_OAGetProperty @ObjFile, 'Path', @Path OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'ShortPath', @ShortPath OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'Name', @Name OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'ShortName', @ShortName OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'DateCreated', @DateCreated OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'DateLastAccessed', @DateLastAccessed OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'DateLastModified', @DateLastModified OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'Attributes', @Attributes OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'Size', @Size OUT

    EXEC dbo.sp_OAGetProperty @ObjFile, 'Type', @Type OUT

    INSERT INTO #FileDetails

    (Path, ShortPath, Name, ShortName, DateCreated, DateLastAccessed, DateLastModified, Attributes, Size, Type)

    SELECT @Path,@ShortPath,@Name,@ShortName,@DateCreated,@DateLastAccessed,@DateLastModified,@Attributes,@Size,@Type

    END

    SELECT @Counter = @Counter + 1

    END

    EXEC sp_OADestroy @ObjFileSystem

    EXEC sp_OADestroy @ObjFile

    SELECT * FROM #FileDetails

    Major kudos to Robyn and Phil for posting the basis of that code on Simple-Talk.

    I used an undocumented extended stored procedure because it was easier than settingh up the loop to find the file names in a directory (Oh my! I have sinned! Twice! ;)) Lemme ask you... who amongst you hasn't used an undocumented stored procedure? Who amongst you has a DBA that hasn't? Who amongst you think that undocumented stored procedures are more subject to change than fully documented ones? The required privs on very heavily document sp_MakeWebTask changed from "dbo" in 2k sp3a to "SA" in 2k sp4 with no warning of the change until the change actually came out. Sure, there was some warning... but that doesn't change the fact that a lot of code suddenly broke and that simple proxy tricks needed to be incorporated to fix the now broken code. And, to top it all off, xp_Dirtree is available and working fine in 2k, 2k5, and 2k8. If you really don't like it, take the time to get the file names using OLE automation, instead.

    Just a little more and I'll get off the ol' soap box... the posts no longer seem to be available after the new forum move that SSC made a while back, but prior to that, a couple of folks (Matt Miller and Sergiy in particular) took on all comers. We even had contests between ourselves. The ONLY case where CLR's even came close to beating T-SQL was with RegEx... the idea of checking email addresses certainly qualifies as a good need for RegEx... but, ya gotta ask yourself... if you can check 10,000 email addresses in 356 milliseconds in T-SQL, is it worth going the CLR route? How about checking a million in "[font="Courier New"]21813 ms without a UDF[/font]" without a CLR?

    Don't get taken in with all the false glitter surrounding CLR's and their usage. If you're going to write code against databases, spend some time learning how to do it right. With the exception of RegEx, damned near all CLR's are simply not the right way to do it. And, consider this... isn't it a bit of an oxymoron to let someone who can't write database code use another tool to try to do it? Get a real database developer, instead. :)

    By the way, in SQL Server 2005, my production code for the DIRectory code [font="Arial Black"]doesn't [/font]have a While loop in it... send me a case of beer and I'll show you how. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ok, what did I miss?? I don't quite know what I missed? If it was the exception from providing a bad path... I didn't miss that, like I said to Phil, I wouldn't provide a bad path in code that uses this, so it wasn't necessarily missed as much as it wasn't important to me.

    Now what I find really funny is you preach about RBAR and non-looping methods, and your alternative to a CLR solution here is to loop with a OLE Automation call? You might compare memory consumption between your solution and the CLR in this article before you posit that it is a a better solution. Either way, you have to enable a disabled by default operation on the SQL Server. You can leak memory with OLE automation much faster than with SQLCLR, and to the point that it can cause server instability, don't believe me, just forget a sp_OADestroy call.

    The choice for CLR doesn't have to always be about performance. I am sure that you would argue a 200ms difference over 100000 rows in speed between a CLR function and your TSQL function is worthy, and proves CLR is not as efficient. You leave the company and another developer has to touch that function and it takes them hours/days to figure it out, edit and test it, did you save something? If you are talkin ms difference and it simplifies code sufficiently to save cost overall in application management then CLR definately provides added benefit to cover the difference. Precious few would even notice the difference when you are in the ms ranges over large datasets.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Phil Factor (2/5/2009)


    Oooh. This is getting very useful. I'd missed out on the advantages of passing string parameters as SQLStrings and the trick of passing back a NULL. It makes perfect sense. Now it has the behaviour that one would want.

    Phil,

    We can actually go a bit further with this if we want and use a custom struct to build a list off of and still get the Directory not Found result row in the set. Try the attached install script. To see the code for the struct and the modified output, use the select from sys.assembly_files listed previously in this thread.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (2/6/2009)


    Ok, what did I miss?? I don't quite know what I missed? If it was the exception from providing a bad path... I didn't miss that, like I said to Phil, I wouldn't provide a bad path in code that uses this, so it wasn't necessarily missed as much as it wasn't important to me.

    That was it and I agree... someone like you wouldn't pass a bad path.

    Now what I find really funny is you preach about RBAR and non-looping methods, and your alternative to a CLR solution here is to loop with a OLE Automation call? You might compare memory consumption between your solution and the CLR in this article before you posit that it is a a better solution. Either way, you have to enable a disabled by default operation on the SQL Server. You can leak memory with OLE automation much faster than with SQLCLR, and to the point that it can cause server instability, don't believe me, just forget a sp_OADestroy call.

    Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.

    The choice for CLR doesn't have to always be about performance. I am sure that you would argue a 200ms difference over 100000 rows in speed between a CLR function and your TSQL function is worthy, and proves CLR is not as efficient. You leave the company and another developer has to touch that function and it takes them hours/days to figure it out, edit and test it, did you save something?

    Heh... you mean yet another Developer who can't read simple T-SQL? Look at the code again... even a complete newbie can figure it out in a couple of minutes if they use Books Online. And, if they can't, they really shouldn't be messing with a database by any means.

    If you are talkin ms difference and it simplifies code sufficiently to save cost overall in application management then CLR definately provides added benefit to cover the difference. Precious few would even notice the difference when you are in the ms ranges over large datasets.

    So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR... here's a reminder...

    /*

    USE [master]

    GO

    DROP FUNCTION [dbo].[os_directory_info]

    DROP ASSEMBLY SQLCLRNet_DirectoryBrowser

    DROP USER SQLCLRNet_ExampleLogin

    DROP LOGIN SQLCLRNet_ExampleLogin

    DROP ASYMMETRIC KEY SQLCLRNet_ExampleKey

    GO

    */

    USE [master]

    GO

    /****** Object: SqlAssembly [SQLCLRNet_DirectoryBrowser] Script Date: 01/23/2009 22:19:49 ******/

    IF EXISTS (SELECT * FROM sys.assemblies asms WHERE asms.name = N'SQLCLRNet_DirectoryBrowser')

    DROP ASSEMBLY [SQLCLRNet_DirectoryBrowser]

    GO

    /****** Object: SqlAssembly [SQLCLRNet_DirectoryBrowser] Script Date: 01/23/2009 22:19:49 ******/

    CREATE ASSEMBLY [SQLCLRNet_DirectoryBrowser]

    AUTHORIZATION [dbo]

    FROM 0x ... really big binary number here...

    WITH PERMISSION_SET = SAFE

    GO

    -- Create the Asymmetric Key from the Assembly

    CREATE ASYMMETRIC KEY SQLCLRNet_ExampleKey

    FROM ASSEMBLY [SQLCLRNet_DirectoryBrowser]

    GO

    -- Create the Login from the Asymmetric Key

    CREATE LOGIN SQLCLRNet_ExampleLogin

    FROM ASYMMETRIC KEY SQLCLRNet_ExampleKey

    GO

    -- Grant the External Access Priviledge to the Login

    GRANT EXTERNAL ACCESS ASSEMBLY TO SQLCLRNet_ExampleLogin

    GO

    -- Create the database user for Authorization on the Assembly

    CREATE USER SQLCLRNet_ExampleLogin FOR LOGIN SQLCLRNet_ExampleLogin

    GO

    -- Set Authorization to the Database User

    ALTER AUTHORIZATION ON ASSEMBLY::[SQLCLRNet_DirectoryBrowser] TO SQLCLRNet_ExampleLogin

    GO

    -- Set the Assembly for External Access

    ALTER ASSEMBLY [SQLCLRNet_DirectoryBrowser] WITH PERMISSION_SET = EXTERNAL_ACCESS

    GO

    -- Create the TSQL Function that maps to the Assembly

    CREATE FUNCTION [dbo].[os_directory_info](@path [nvarchar](max), @filter [nvarchar](100) = null)

    RETURNS TABLE (

    [name] [nvarchar](max) NULL,

    [is_directory] [bit] NULL,

    [size_in_bytes] [bigint] NULL,

    [create_date] [datetime] NULL,

    [last_written_to] [datetime] NULL,

    [last_accessed] [datetime] NULL,

    [attributes] [nvarchar](max) NULL

    ) WITH EXECUTE AS CALLER

    AS

    EXTERNAL NAME [SQLCLRNet_DirectoryBrowser].[UserDefinedFunctions].[os_directory_info]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/6/2009)


    When I run that bit of computational heaven against the test data I built above, here's the results I get...

    1263 ms for Looping UDF

    356 ms for my UDF

    233 ms without a UDF

    Like I said... I don't know where the hell the author of that article get's off saying that his looping UDF takes over 14 seconds to run, but he was wrong even in that!!! :crazy:

    Add the overhead of CLR's to that, and ya just gotta ask yourself, is it all worth it?

    Somebody, please! Test the CLR and the code I just posted all on one machine! Let's see what really gives!

    I didn't use his CLR method directly because it has an inherent flaw in its usage of Regex in it that I wouldn't make in my own code. Instead I used a functionally identical function:

    using System;

    using System.Data;

    using System.Data.SqlClient;

    using System.Data.SqlTypes;

    using Microsoft.SqlServer.Server;

    using System.Text.RegularExpressions;

    public partial class UserDefinedFunctions

    {

    private static readonly Regex emailRegex = new Regex(@"^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$");

    [Microsoft.SqlServer.Server.SqlFunction]

    public static SqlBoolean EmailRegex(SqlString email)

    {

    // Put your code here

    return new SqlBoolean(emailRegex.IsMatch(email.Value));

    }

    };

    Using the static readonly member means the Regex object is built once and reused which is a bit more efficient memory wise.

    The results over 5 test runs using Jeff's code versus the above on my Dell Dual Core 2.0 Ghz Laptop with 4GB RAM:

    533 ms for Looping UDF

    186 ms for my UDF

    123 ms without a UDF

    63 ms for CLR UDF

    I must have a better computer than Jeff does, but all numbers are roughly 1/2 of his projected times so it should show the numbers are all fair.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jeff Moden (2/6/2009)


    Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.

    Not true. I've seen this happen in consulting twice for OLE automation and a few times for XML documents that get created but not removed. Its not that difficult of a mistake to make. Leaking in CLR is only possible with UNSAFE assemblies, as a part of the design of the hosting environment. The ability to leak memory requires Host Protection Attributes that are prevented from loading in anything but UNSAFE.

    So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR.

    You could do everyone a favor and just reference them to the text file rather than posting the varbinary output string of a compiled assembly in here and making this page of the forums unusable because it doesn't know how to handle a string that long. You aren't comparing similar code, by looking at the binary compiled assembly object. The code is the C# and should the following connect item get added to SQL Server:

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=265266

    the ability to write code similar to the following will exist in SQL:

    CREATE PROCEDURE my_proc @par1 int,

    @par2 int WITH

    EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS

    using System;

    using System.Data;

    using System.Data.SqlClient;

    ...

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • sean hawkes (2/6/2009)


    To me the SQLCLR vs. xp_cmdshell seems like the old script vs. small application debate. Seems that SQLCLR is very similar to writing a small application. I mean you're writing code, building an assembly, mapping it to the sql server. Seems like a lot of overhead to get a directory listing into a table. Granted I do like the idea of passing table variables and the such are nice and I can see for operations that are impossible within TSQL the need. But, I can see how doing all of this is like hitting a finishing nail with a sledge hammer. Again, to me it comes down to what am I really trying to do, and how much time I have to do that task. Lets face it there are situations where time is of the essence and you don't have time to create a new assembly and "hook" it to your sql server.

    My other question is how does this differer from creating a dll that is mapped to an extended stored procedure?

    Sean

    This solution is based on the premise that security is important enough to take the time to implement. If time is more important than preventing unrestricted access to the OS as the SQL Service account with xp_cmdshell, then this solution probably isn't for you.

    In the words of Mr Miyagi, "Different, but same". This is different from an extended stored procedure because it is in managed code which is more commonly known, easier to develop, and doesn't require the level of knowledge that C++ unmanaged extended procedures did. It also doesn't have the risks of memory leaks that the unmanage code did given the CAS level that it is created with. It is the same because you are using an external language to write the code, but that is about the only similarity that exists between the two.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (2/6/2009)


    I didn't use his CLR method directly because it has an inherent flaw in its usage of Regex in it

    that I wouldn't make in my own code. Instead I used a functionally identical function:

    Outstanding! Thanks for running the tests! Like I said, the one place that Matt Miller, Sergiy,

    and I proved that CLR's were definitely better for performance was in the area of RegEx and I

    was also absolutely sure that you could make a major improvement there, as well. Well done!

    And, I'm not being a bit ironic there... I mean it.

    Because of the crud you have to go through with OLE automation and the slowness of the

    associated sp_OA* routines, I'd be willing to bet that a CLR would also beat the code I

    posted for performance. We didn't test that in any of our deep dives, but I'm sure it's true and

    a good CLR for that would be great.

    But, like you said, performance isn't the only thing to consider when making the decision to use

    a CLR. Look at the installation code you had to write. Can a "newbie" actually do that as you

    suggested? I submit, probably not. And then, there are those hardcore DBA's that say "Not on

    my watch" when it comes to CLR's. 😉 Better know how to do several different ways... something

    a "newbie" wouldn't actually know, anyway.

    As you've just shown, although there's a 50% difference in performance between that given CLR

    and the fastest code I wrote in T-SQL to do the same thing but, consider for a moment that

    it's only 60 milliseconds difference... across a million rows and extrapolating your run times on 10k

    rows vs a million rows, thats only 6 seconds difference... and I have two lines in a WHERE clause...

    with the CLR, there's source code to check in, a DLL to test and install, and an installation procedure

    to make the code useful as a CLR function. Lord help us if we migrate servers.

    Just to be clear, I'm not saying that all CLR's should be avoided. You said it quite correctly

    when you said that some folks abuse them and you picked a very good example for what a CLR

    should probably be used for. In fact, I have no problem with your good article at all. One of my major

    points was that someone posted some a link to some pretty bad T-SQL and the race between that

    and a CLR and a test that probably wasn't even done correctly. CLR's get glamourized because of

    such mistakes much like set based code gets a bad name because someone writes some really bad

    set based code to compare to. I want to make sure that, because of that miserable article that someone

    referred us all to, that people don't go CLR crazy like they did when cursors were introduced in early

    versions (6.5 I believe) of SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jonathan Kehayias (2/6/2009)


    Jeff Moden (2/6/2009)


    Heh... listen to you... Anyone capable of writing either the CLR or the T-SQL isn't going to miss the sp_OADestroy. Anyone who doesn't understand enough about T-SQL will also like not know enough about how to prevent memory leaks using a CLR.

    Not true. I've seen this happen in consulting twice for OLE automation and a few times for XML documents that get created but not removed. Its not that difficult of a mistake to make. Leaking in CLR is only possible with UNSAFE assemblies, as a part of the design of the hosting environment. The ability to leak memory requires Host Protection Attributes that are prevented from loading in anything but UNSAFE.

    So does putting my directory code in a stored proc. And it doesn't require the monster code you included in your article just to install the CLR.

    You could do everyone a favor and just reference them to the text file rather than posting the varbinary output string of a compiled assembly in here and making this page of the forums unusable because it doesn't know how to handle a string that long. You aren't comparing similar code, by looking at the binary compiled assembly object. The code is the C# and should the following connect item get added to SQL Server:

    http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=265266

    the ability to write code similar to the following will exist in SQL:

    CREATE PROCEDURE my_proc @par1 int,

    @par2 int WITH

    EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS

    using System;

    using System.Data;

    using System.Data.SqlClient;

    ...

    That's what I mean... what's the difference between someone forgetting to do an sp_OADestroy and someone forgetting to mark a CLR as something other than UNSAFE especially if they're a newbie? Even if it gives you a warning, is a newbie gonna know what to do about it?

    But, point well taken, hopefully on both parts... you do have a caveat for both methods.

    Ya beat me to it on the other thing... I didn't realize the binary was quite so wide in your code and started fixing it as you were typing. It's fixed now. Thanks for the heads up, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Its taken me close to a year to create a series of articles that have sufficient utility to be worthwhile, and are logically good use of SQLCLR so as to not be chewed alive by the community. Replacing less secure methodologies is my favorite place to make use of SQLCLR, though generally speaking, I wouldn't be reading directories from my TSQL code either, so I don't really have these examples in use in my own environment.

    The EmailRegex I do have in use in a data warehouse that I built that loads data from all different kinds of sources, so it is very useful in the data cleansing process post load. I didn't do that in CLR for performance, more than I already had the code from answering a forums post, and it was easily reused. The difference in performance between the TSQL and the CLR isn't enough to make it matter to me. Like I said, if you are talking ms what did you really save and do you really notice it?

    As for the installation script I provided, could anybody do that. Sure, if they were willing to research the "best practice" implementation for a EXTERNAL_ACCESS Assembly. Most aren't willing to go that route and instead just flip TRUSTWORTHY ON and move on with deploying under dbo. In fact, I think I have one of the few articles on the web that actually uses Keys over Trustworthy for non-SAFE assemblies. Since my purpose was to demonstrate how to be more "secure" with SQLCLR, setting TRUSTWORTHY wouldn't have really been good to do, especially in master, so I did it the right way, rather than the easy way. The installer script here does provide a simple framework that can be used for any assembly. Just a little copy paste rename kind of stuff to do.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jeff Moden (2/6/2009)


    Ya beat me to it on the other thing... I didn't realize the binary was quite so wide in your code and started fixing it as you were typing. It's fixed now. Thanks for the heads up, though.

    I wish there was a better way to provide that binary string in forums. At least here you can attach files to forums posts as text. Other places it is completely impossible to send someone a sample CLR compiled assembly for even a simple Hello World example. It is equally as long as the assembly I posted for reading the directory information.

    Jonathan Kehayias | Principal Consultant | MCM: SQL Server 2008
    My Blog | Twitter | MVP Profile
    Training | Consulting | Become a SQLskills Insider
    Troubleshooting SQL Server: A Guide for Accidental DBAs[/url]

  • Jonathan Kehayias (2/6/2009)


    so I did it the right way, rather than the easy way

    BWAA-HAA!!! Man, I'm right there with you on that one! Now, all we have to do is convince the rest of the world, especially the managers that want it real bad, to give people just a little more time so they don't get it that way... real bad.

    Good talking with you again, Jonathan. Kudos on the article.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 31 through 45 (of 168 total)

You must be logged in to reply to this topic. Login to reply