Starting in SQL Server 2005, Microsoft integrated the .NET (pronounced Dot Net) runtime into the database engine. The runtime engine is called the Common Language Runtime, most commonly referred to as the CLR. This integration allows for code to be executed within the context of a query that is not possible with the current offering of T-SQL built-in functions. This integration is commonly referred to as SQLCLR and in this series we will explore the various aspects of using SQLCLR code. The topics will include basic concepts, security, development, the individual object types that can be created, administration, and some advanced concepts.
Programming is all about controlling the behavior of the computer. I find it best to learn new technologies by learning what they can do and how they do it before jumping into implementing a project. Providing a context for the information about to be learned well help you make better decisions as to when you should make use of this feature; knowing how to create a function to calculate a square root won’t help you if you are not aware that it is much more efficient to do that in T-SQL.
Let’s start by defining what SQLCLR programming is so we can all be on the same page and have a context for applying the more specific information in the rest of the articles in this series. We will get a high-level description and then discuss the capabilities of the CLR integration and/or .Net in general that go beyond what T-SQL can do. In the next article we will take a look at some example code to see many of those concepts in action and how the various pieces relate to each other.
What SQLCLR Is
For this first part I will try to dispel some misconceptions so that it will be easier to focus on what SQLCLR really is.
Oddly enough, Microsoft does not officially use the term “SQLCLR” (officially meaning in the SQL Server product documentation / MSDN, though you will find it in TechNet articles and in a few places in Visual Studio as both "SQLCLR" and "SQL CLR"). However, much like the word “performant”, it is so frequently used that the technicality likely doesn’t matter. Besides, it is a perfectly cromulent word ;-). The preferred terminology seems to be "CLR integration" when used in the general sense, and simply "CLR" when talking about code, objects, etc. For our purposes I will use either "SQLCLR" or "CLR-based".
SQLCLR is not:
A replacement of T-SQL: SQLCLR code does not interact directly with SQL Server; it has no native means of handling INSERT / UPDATE / DELETE / SELECT and SMO is not available. Unlike a fully integrated, functional language such as PL/pgSQL for PostgreSQL, SQLCLR is entirely separate and requires connecting to the database and issuing SQL statements for any interaction. SQLCLR code is invoked via wrapper objects (Stored Procedures, Functions, Aggregates, Types, and Triggers) that are referenced from T-SQL.
All or None: Using SQLCLR does not mean that all of the logic of an operation needs to be performed in the CLR code. Meaning, if there is a multi-step Stored Procedure that needs to be created and part of it either requires, or would at least benefit from, .NET functionality, then usually only the logic that needs the .NET functionality would be accomplished via CLR-based code. The rest of the Stored Procedure would still be in T-SQL and that would simply call the CLR-based object.
Always faster or Always slower: SQLCLR code can be either faster or slower than doing things in straight T-SQL, depending on how it is written and how it is used. There are good uses and bad uses of this tool, just as there are with cursors and triggers. More on this later.
Necessarily a security risk: While it is possible to expose SQL Server to the Operating System and/or Network, which is potentially dangerous to both sides, this is all within the control of the DBA. More on this later.
a framework that bridges the environment of the SQL Server database engine with the rich programming environment of .NET. This allows for:
Extending capabilities of queries beyond T-SQL built-in functions
Performing certain operations faster than they can be done in T-SQL
Better interaction with external resources than can be done via
SQLCLR allows you to create Stored Procedures, Functions, Aggregates, Types, and Triggers to do things that either cannot be done, or cannot be done as efficiently, in T-SQL. Examples of things that cannot be done in T-SQL are accessing external resources, multi-threading, and impersonating the current user when accessing external resources. Yes, within T-SQL it is possible to use the OLE Automation Stored Procedures (i.e.
SP_OA*) (here) to access external resources, but these are not nearly as flexible and might have other memory and/or security issues. And Linked Servers can impersonate when connecting to other instances, but they do not handle general network and file system functionality. Examples of things that cannot be done as efficiently are calculations and text / string manipulation. When used correctly, SQLCLR can be of great help in getting past some otherwise insurmountable problems.
xp_cmdshell and several other features, the "CLR Integration" feature that allows for running custom SQLCLR code is disabled by default. When this feature is disabled you can create Assemblies and T-SQL wrapper objects that references methods within Assemblies, but those T-SQL wrapper objects won't be usable. To enable "CLR Integration", run the following two statements:
EXEC sp_configure 'CLR Enabled', 1;
What SQLCLR Can Do
In order to better convey the usefullness of SQLCLR, the capabilities can be broken down into three categories. The first category is functionality that simply cannot be done either in T-SQL User-Defined Functions or in T-SQL Stored Procedures. The second category is functionality that, at least to a degree, could be done in T-SQL UDF’s, but only via
OPENROWSET, or in two cases, also through a view. The third category is performance.
Can Only Be Done In SQLCLR
The following items can only be handled using SQLCLR.
Streaming Table-Valued Functions (sTVF): While T-SQL Inline Functions are generally efficient, they are mainly just parameterized Views and do not allow for complex code. T-SQL Multi-line Table-Valued Functions do allow for more complex code, but there is a downside in that the result set needs to be collected fully before the Function can return and hence release any of that data to the calling process. This can take quite a heavy toll on tempdb which is where those results are collected. SQLCLR Functions, however, have the capability (if programmed to do this) of streaming their results out, as each row is determined. This greatly reduces the hit on tempdb and allows the calling process to start working with the Function’s results without having to wait until the Function is entirely done with its processing. You can find an example of the full streaming option in this related article: CLR Table-Valued Function Example with Full Streaming (STVF / TVF) ( http://www.sqlservercentral.com/articles/SQLCLR/68850/ )
Dynamic SQL: One capability that can be quite handy, even if not needed that often, is the ability to execute dynamic SQL within a Scalar or Table-Value Function. In T-SQL this is not allowed, but within the .NET code of the SQLCLR Function, you can manipulate the SQL that will eventually be executed (assuming the function executes a query, of course). The only real drawback with doing dynamic SQL in a TVF is that you cannot stream out the results while using the in-process Context Connection, and hence either the result set needs to be materialized fully in memory before it can be released as the function ends, or you need to use a regular external connection which requires the Assembly to be set to
Access External Resources / Replace xp_cmdshell: Here are just a few aspects of how SQLCLR is nearly always a better choice for most anything you would use
Passing data in is easier: Depending on what parameters you need to send to the external command, using xp_cmdshell can be quite cumbersome. xp_cmdshell takes a single parameter which is the command you are wanting to run plus all its parameters. Hence you need to format a single command line string and will face the same issue as when creating dynamic SQL, issues such as escaping embedded quotes and converting non-string variables. The MSDN page on xp_cmdshell states:
With SQLCLR, you call either a Stored Procedure or User-Defined Function, both of which have separate parameters that can be of whatever datatype is needed.
command_string is varchar(8000) or nvarchar(4000), with no default. command_string cannot contain more than one set of double quotation marks. A single pair of quotation marks is required if any spaces are present in the file paths or program names referenced in command_string. If you have trouble with embedded spaces, consider using FAT 8.3 file names as a workaround.
Getting multiple columns of a result set back is easier: xp_cmdshell results are a single column and would need to be parsed / split if containing multiple columns of data, which is both an added layer of complication and error prone. Another option is to have the external process save its results to a delimited file that can be imported via BULK INSERT or OPENROWSET(BULK...), which is less error-prone than splitting a string but can be likewise complicated in addition to possibly needing additional permissions granted (e.g. OPENROWSET(BULK...) requires the “ADMINISTER BULK OPERATIONS” permission).
No external dependencies: There is no guarantee that whatever program or script you are running via xp_cmdshell exist as SQL Server doesn’t know anything about it. Meaning, you could have ProrgamA.exe installed on one server and it works just fine via xp_cmdshell but then you run that same command line on another server and for whatever reason ProgramA.exe is not there. External programs and scripts are not backed up, at least as part of the SQL backup, so they might not be there when restoring a crashed server to a new machine. Assemblies, on the other hand, are part of a DB backup as are the wrapper objects.
Impersonation: xp_cmdshell runs in the security context of either the Log On account of the SQL Server service / process for anyone in the
sysadmin server role, or the xp_cmdshell Proxy Account for everyone else (if one has been set up). By default, SQLCLR code runs in the security context of the Log On account of the SQL Server service / process for everyone. However, you can code your SQLCLR object to access these external resources (i.e. OS, File System, and Network) with the security credentials of the Login executing the SQLCLR object (for Windows logins, not SQL Server logins). There are restrictions on when this can be done and what exactly can be done with impersonated credentials, but it is a very nice option to have, especially since it is not limited to a single security context like the xp_cmdshell Proxy Account.
Multi-threading: SQLCLR code can make use of multi-threading to spawn several processes at the same time. This ability is only available if the Assembly is set to
UNSAFE, which is typically frowned upon, but sometimes it is really nice to have that option. If this ability is used, it should be done with much caution and testing.
Error Handling: T-SQL User-Defined Functions cannot make use of either
THROW. This makes it impossible to either trap and handle errors or stop execution with an error that will in turn stop the query. On the other hand, trapping errors and forcing exceptions are quite easy in .NET code. However, to be fair, I did find this question on StackOverflow showing that it is possible to force an error in T-SQL Scalar UDFs and Multistatement TVFs, even if in a very limited and clunky way: http://stackoverflow.com/questions/1485034/how-to-report-an-error-from-a-sql-server-user-defined-function
Custom Aggregates: Sometimes we have a group of values that we want to operate on in the same way that we use
AVG, etc. SQLCLR gives us that capability with User-Defined Aggregates (UDA). You can now add statistical operations that aren’t one of the built-in Aggregates, such as Median (http://www.sqlservercentral.com/articles/SQL+CLR/3208/ ). And starting in SQL Server 2008, you can pass in multiple parameters to a UDA (though technically it can even be done in SQL Server 2005 if using a scalar UDF to first combine multiple values into a single UDT that is in turn accepted by the UDA).
Custom Types: You can create your own User-Defined Type (UDT) to handle complex data and include methods and properties that make use of the data you store in them. Properties and methods of UDTs are accessed in the same way that the "
.value()" and other functions are accessed on XML fields and variables (e.g. @XMLvar.value()). For example, you could create an Address Type that lets you enter in the Street, City, State/Region, PostalCode, and Country. You can then have a validation method that verifies the PostalCode format for the specified Country and returns a
BIT as well as a property that returns the full Country Name (assuming you are only storing the 2-character ISO code). Lastly, SQLCLR UDTs can be sent in as parameters to SQLCLR Stored Procedures and Functions.
Capture Messages: Sometimes messages are sent via the
PRINT command or
RAISERROR (using a Severity of 0 - 10) that you might want to capture. If you are using SSMS, these messages display in the "Messages" tab but there is no way for a T-SQL Stored Procedure to capture the messages sent by any Stored Procedure that it in turn calls. SQLCLR Functions and Stored Procedures, on the other hand, can capture these messages.
Participate in Parallel Plans: T-SQL UDFs cannot be used in parallel plans and so will force non-parallel plans to be chosen. But SQLCLR UDFs (as long as they do not do any data access and are marked
IsDeterministic = true, and they might also need to be in an Assembly marked as
SAFE) can participate in parallel plans, which can help improve performance.
Intercept Result Sets: Not only do you have access to the result set metadata (something you can do in T-SQL as of SQL Server 2012 via sys.dm_exec_describe_first_result_set, whereas this can be done in SQLCLR starting in SQL Server 2005), but you can also access individual result sets. This means that if you have a Stored Procedure that returns multiple result sets and you only want one of them, you can do that rather easily in SQLCLR. In fact, you can even manipulate the result set if you want. You can add or remove fields, or rename fields, or even combine 2 or more result sets into a single result set.
Share / Cache Memory Across Sessions: In T-SQL you can use
CONTEXT_INFO to store a value in memory that is available to anything else running in that session and is not affected by GO statements or transactions, etc. But the value is limited in both size (it is a
VARBINARY(128)) and scope (only available to the current session). There are not many scenarios in which one would want to share data across sessions (usually to cache data between connections), but if it is needed, the only way to do this in T-SQL is via a global temporary table or a real table. But with SQLCLR you can do this in memory using a static class variable. If it is an updateable variable the Assembly will need to be marked as
UNSAFE. Or, if the variable is a collection, you can make it "readonly" while still being able to Add and Remove items from it, in which case the Assembly can be marked as
SAFE (assuming no other functionality requires
UNSAFE). Just like with multi-threading, this should only be done with much caution and testing as whatever memory is used for caching is taken away from query processing.
Set Persistent Environment Variables: If you set an environment variable using xp_cmdshell via "SET VariableName=Value", then that variable is only available to that particular call to xp_cmdshell and will disappear once that function ends. But in SQLCLR, you can call
Environment.SetEnvironmentVariable and that variable will become part of the SQL Server service process. This means that the value can be accessed in T-SQL (using either
xp_cmdshell 'echo %variable_name%' OR SQLCLR with
Environment.GetEnvironmentVariable), any script called from xp_cmdshell, CmdExec job steps of SQL Agent jobs (at least sometimes), etc.
Easier In SQLCLR
The next four items cannot be done in T-SQL functions, at least not explicitly. They can all be done via OPENQUERY, though the server definition has to be enabled for DATA ACCESS, and OPENROWSET, though the server-level configuration option for "Ad Hoc Distributed Queries" needs to be enabled. Also, depending on what you are doing, I believe there could be some query plan issues with OPENQUERY / OPENROWSET. However, each of them can be done more easily—to varying degrees—in .NET, especially considering that whatever command is submitted to OPENQUERY / OPENROWSET must be a literal string and cannot be dynamic via a variable.
SQLCLR allows you to do the following things when using a regular / external DB connection (i.e. not the current session), but that requires granting at least EXTERNAL_ACCESS permissions to the assembly. On the other hand, several of the following items can be done, even if to a limited degree, via the Context Connection (current session / in-process) which works even when the assembly's permissions are set to SAFE. Please note that the first two items -- using NEWID and RAND -- can be accomplished in T-SQL by creating a view to simply SELECT the function and then SELECT from the view in the T-SQL UDF. Hence, if you need to generate those types of values but don't need any other CLR-specific functionality, then stick with the view work-around.
Generate a GUID: Using
NEWID is simply not allowed in T-SQL functions. It is also disallowed in SQLCLR functions when using the Context Connection (i.e. the current session; more on this later). But you don’t need
NEWID since the .NET languages can natively create a GUID, which works when using the Context Connection.
Generate a random number:
RAND, just like
NEWID, is not allowed in either T-SQL functions or SQLCLR functions when using the context connection. But .NET languages can also natively create random numbers. Again, native generation works when using the context connection.
Modify State: Neither T-SQL nor SQLCLR functions are allowed to modify the state of the Server or Database. Hence, modifying state requires either OPENQUERY / OPENROWSET in T-SQL or making a standard DB connection in .NET code.
Execute a Stored Procedure: T-SQL functions can only call Extended Stored Procedures. SQLCLR functions, on the other hand, can call Stored Procedures via the context connection, albeit in a restricted fashion. When using the context connection, a Stored Procedure can pass back data as a result set, output parameter, or even the return value. The restrictions are that the code within the Stored Procedure cannot do anything already forbidden to T-SQL functions (e.g. SET commands like SET NOCOUNT, creating temp tables, using NEWID(), DML statements, change state of the database, etc). A key benefit of being able to call a read-only Stored Procedure using the context connection is that it is in-process and hence shares the same session resources, such as temp tables and CONTEXT_INFO. Using OPENQUERY / OPENROWSET, to do anything, will not be in-process.
Comparing performance between T-SQL and CLR objects is a more complicated topic. To begin with, you can really only compare functionality that is the same between them. Then you have to consider the type of logic being done, is the logic being done efficiently in both types of code, if comparing functions is it being run via a SET statement or a multi-row query, and so on.
Microsoft has some guidelines (here) as to what situations are better suited to CLR-based objects. If you will be working with SQLCLR objects, then you should at least be aware of that information. However, it all comes down to testing and this is all testable. I did some research on this topic in July of 2011 and published my findings here: https://www.simple-talk.com/sql/t-sql-programming/clr-performance-testing/
Before we finish, let’s take a brief look at some of the security concepts as this is an area of major concern for people as well as a large part of SQLCLR programming. We will take a more detailed look at security later in this series.
SQLCLR objects (Stored Procedures, User-Defined Functions, etc) are governed by the same permissions as regular objects. You need to GRANT permissions and you can also DENY permissions. If you don’t want someone running a particular Stored Procedure, don’t give them access to it. The fact that it is SQLCLR is irrelevant.
The .NET code itself can only see the types of data that you allow it to see. Functions can explicitly allow or deny the ability to access both user data and system. This permission cannot be overridden nor can it bypassed.
The SQLCLR code, even if written to access the file system, network, or do something like multi-threading, cannot do any of those things without the Assembly first being granted the appropriate permission. By default, Assemblies are created with the
SAFE permission. If you want the code to be able to access the file system and/or network but not spawn other processes or threads, then the assembly can be set to
EXTERNAL_ACCESS. If the desire is for the Assembly to be able to do anything it wants to, including sharing memory across SPIDs or multi-threading, then it needs to be granted the
UNSAFE permission set.
Good to Know
The "CLR Integration" feature isn't the only use of CLR within SQL Server. It just refers to the ability to add custom CLR code. So, even with the configuration option of "CLR Enabled" set to 0 (i.e. off / disabled), the CLR is most likely still being used by other features of SQL Server. The following list is at least some, but possibly not all, of the features that make use of CLR within SQL Server and are not affected by the "CLR Integration" feature being enabled or disabled:
Change Data Capture
Dynamic Management Framework
Please note that, according to the MSDN page for the "clr enabled" Server Configuration Option:
Common language runtime (CLR) execution is not supported under lightweight pooling. Disable one of two options: "clr enabled" or "lightweight pooling". Features that rely upon CLR and that do not work properly in fiber mode include the hierarchy data type, replication, and Policy-Based Management.
The introduction of CLR integration, starting with SQL Server 2005, greatly expanded the possibilities of what can be done within the context of queries. Some operations that could otherwise be done in T-SQL can be done more efficiently and/or more easily in .NET code. And some operations that cannot possibly be done in T-SQL are now possible. In the next article we will take a look at an example Stored Procedure and Scalar Function. You will see how the CLR code, the Assembly, and the wrapper objects relate to each other while testing out many of the new capabilities mentioned in this article.
SQL Server concepts:
About the Author
Solomon Rutzky has been working with databases for `
YEAR(GETDATE()) - 1996` years and specifically with SQL Server since 2002. He has written several articles for SQL Server Central and Simple-Talk. Solomon is owner / founder of Sql Quantum Leap ( http://SqlQuantumLeap.com/ ), and previously worked for companies such as PLATINUM technology and ChannelAdvisor. Solomon is also the author of the popular SQLCLR library of functions: SQL# ( http://SQLsharp.com/ ).