Note: I've updated this article slightly after nearly two years of working with SQL Server, reading lost of forum posts and other articles, and learning more and more about how SQL Server 2005 should work.
I've written an editorial (3/28/2005) and someone quite a bit more knowledgeable than I, Joe Celko, added his two cents on the CLR in the database for SQL Server 2005. I understand that nothing it going to change, especially as Oracle and DB2 load Java code and who knows what else inside the db in an effort to generate a few more sales, but I did think it was worth more than a few words in the editorial.
Since its release, I've not really seen too many places where the CLR has been integrated and there haven't been all that many places where people have found it makes sense to use. I'm sure I'll have lots of comments on this point and I'd really welcome them. Please let us know where you've found it useful to integrate with the CLR.
First of all, if you run through the comments in the editorial, you'll see some very good arguments and thoughts on having the CLR in the database. I also stumbled upon a great article that specifically talks about the CLR integration and examines some of the goals of the development team. It's on MSDN and is titled SQL Server 2005: The CLR Enters the Relational Stage. It's great reading and you should take 30 minutes and go through it.
There are a couple things in the article and the comments that I found interesting and are worth examining a bit more. And worth spending a bit more time on while I control my knee jerk reaction to toss all developers as far away from my database as I can.
The Business Layer
The article talks about moving the business layer into the database with the CLR, getting logic closer to data for performance reasons. A good idea, but as with most articles on the subject, it asks more questions than it answers. Trying to decide and determine how much business logic to put in the database vs. in some middle layer of code, it a tough task. Andy and I argue about this all the time because there is no clear line as to where the business logic should go. Putting it all in code sometimes leads to result sets that are too large and require lots of CPU, memory, etc. to process. Putting too much in the database leads to poor performance sometimes as the single server is asked to take on too much of a load. It also has its own issues of version control, code transparency for developers, DBAs slowing the development process, etc.
So what do you do? If you figure it out, let me know because I'm still holding up my thumb and making WAGs.
In many ways, however, the business layer is in the database with SQL Server 2000 and the tremendously complicated stored procedures and functions that I see implemented. Many of the complaints that developers have with T-SQL, many of which may have led to the shoving of the CLR into SQL Server, are born from the idea that some business logic in the database makes sense. Only time and some load on your application will show you if your decisions were for the best.
Or maybe this is the way Microsoft competes with the other application servers out in the world. By putting more business logic in the database, they've almost combined the db server and app server together. I like that approach in some ways, but I think some work on larger clustered systems that can load balance is needed.
T-SQL Is Here To Stay
Boy, I hope so. It's probably my strongest computer language, so it sticking around should keep me employable for a little longer. But this is one of those things that was pointed out in the article and mentioned by a few people commenting on my editorial, MVP Adam Machanic among them.
T-SQL, or just SQL, is the basis for all relational languages. I'm sure that we'll continue to see more extensions to the language, especially in SQL Server. It is the basis for working with relational data, especially in large sets. Even if the majority of stored procedures and functions were moved to .NET code, there would still be some T-SQL needed to query large data sets and perform the complex joining of relational data.
Unless SQL Server migrates to an object database, or the basis of SQL Server's sets moves to a completely cursor based solution, I expect that T-SQL will exist for a long time. And with the new enhancements being added, it becomes a more flexible and powerful language. Error handling and CTEs being two of the strongest new things I've seen.
I'm not completely sold, however, on the idea that the CLR will be a good thing, overall. It's probably my cynicism having worked with hundreds of developers over the years, but I still think that it will cause more PSS calls than it will prevent.
In reading some of the comments and design goals, it does seem like the idea of integrating the CLR was thought through, but in a naive, trusting way. One thing mentioned is that code in the database cannot call assemblies on the server (or on any other server for that matter) in the file system. Assemblies must be explicitly loaded into SQL Server before they can be used. An admirable design and one that mimics other decisions made in software.
But this ignores two practical things. First, we are mostly practical professionals, both in the corporate world and the shrink wrap world. We need to get things done, so we'll press for code to be loaded to solve a problem, even if it is not 100% tested. And if the DBA's object, we'll get a manager or someone else to "take responsibility", but they won't be the one getting the calls when things break. And even if they get the blame and their sorry tail removed from the company, it will still be the DBA and developer that have to clean up the mess and will absorb some of the blame.
The second thing is that who will certify or verify the code? Most DBAs aren't qualified to full examine and sign off on VB.NET, C#, or some other .NET language, your truly included. I can make a good guess, but my eye isn't experienced or practiced enough to tell good code from average to bad code. And certainly I can't tell if there will be a serious problem in many cases. This means that the DBAs will just load assemblies blindly in many cases.
Yes, I know it's their fault, but it ends up wasting time and causing lots of issues and it's a place where I think Microsoft is giving us a tool that we haven't been instructed to use. It's not their place to do it, but it's almost like allowing everyone to fly their own plane if they can buy one. Some people can handle it, but most cannot.
I've also seen reports from a few people that under high loads, especially when marshalling strings, that the CLR can slow down. That's certainly a huge concern for me. If I really needed something to perform well and I'd counted on the CLR, it might cause tremendous problems later on to try and rebuild a system without it. I hope that this is a bug of some sort and these issues will be addresses in SQL Server 2008.
Another thing that I see is that there are these various levels of safety for assemblies. SAFE, EXTERNAL-ACCESS, and UNSAFE. Knowing how most things work it the IT world, people will load assemblies as SAFE with the best intentions. Then they won't work or they need something else, so they'll be changed to EXTERNAL-ACCESS. And if there's any problem, right after we give administrator rights to the developer, we'll alter the assemblies to be UNSAFE so they can get to whatever cool feature the developer wants.
Wow, I just had a shiver go up my spine.
The type safety, thread safety, application domains, and many other safeguards are nice. And necessary. And for well trained, competent and experienced professionals, they'll do great things. I'm just not sure it's overall a good idea.
It's Not All Bad
I do think there are some places where the CLR will come in very handy. I'm not really a BI guy, but I can see that having user defined aggregates would be useful, especially as people to more complicated analysis of data. Someone may want intersecting 2 quartiles below the mean of something equally statistically meaningless and now they'll be able to write code to get it.
And like MDX, there will be 7 people in their country that understand what they did and maybe a few more that care. But they'll be able to create code to perform complicated analyses that is impossible in SQL Server 2000 and when they have to explain the meaning to a business user, they'll wish they hadn't.
And there's the possibility for a user defined datatype and various operations against it. Like you might create a Passport_Number type and then have operations on this like Passport_Number * Passport_Number when a foreign national gives birth in your country. Or something else equally useful.
I'm getting a little silly, but there are probably people that will find these things really useful. Some cumbersome and complex problems will probably get much easier to solve in the database server with these extensions. I just think that many DBAs will get stuck muddling around through this stuff because someone thinks the new feature is "cool" and not necessarily more useful. Or it makes the developer's job easier, but the DBA's life a nightmare. That sounds more like it.
The examples presented in the article aren't that applicable in my mind. You want to find out the top 3 countries by person count for a series of categories. A complicated T-SQL approach, but not any less complicated than what I see presented in the .NET solutions. And I'm not sure that there isn't a better way to handle this in T-SQL, especially with CTEs.
I did see one really cool idea for a .NET integration. A check constraint that uses regular expressions to validate data inserted into a table. Bill Wunder, of the Boulder SQL Server Users Group and SQL Server MVP, showed it as a way to validate IPs being inserted. To me a much better way than a user defined datatype, but that's me. However it could be useful for names, addresses, etc., allowing a touch more validation.
And annoying more users. Take their data, store it, then tell them something's wrong and let them fix it or not. Or validate on the front end. Do we really want to spend valuable database CPU cycles validating stuff that could be handled elsewhere?
Ok, enough. I think you get my feelings on the CLR integration. It will be good in some ways, and it will help some people. I'm just not sure it's a good idea.
Return to Steve Jones' home