Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

CLR Table-Valued Function Example with Full Streaming (STVF) Expand / Collapse
Author
Message
Posted Wednesday, December 23, 2009 6:46 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 11:11 AM
Points: 7,122, Visits: 15,031
I had attempted to use something like this to bring in data from an external file. One of those "ugly delimited" files, kind of delimited and kind of not, that I needed to bring in. This would have offered a way to parse it and clean it up in one single shot, and I would simply need to insert the results from the TVF into my destination table, skipping the whole staging table, clean up the data, etc...

I ended up abandoning it because it was trying to build the entire table in mem before returning it, so the CLR function would crash before getting anything out. So - I ended up going in another direction and got it done that way.

I had also looked at building the old running totals this way, running into the same problem (order is in fact guaranteed this way so no controversy over documented vs not). I got around that using the CLR SP (since you can stream the results without having to build the entire return set first), but still use the "other way" until I am formally forbidden from doing so , since it is still faster than the CLR SP (although not by a whole lot).



----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #838830
Posted Wednesday, December 23, 2009 7:39 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, July 29, 2014 12:58 AM
Points: 354, Visits: 1,869
Jeff Moden (12/23/2009)
Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.


Hey Jeff. If by "such a thing" you are referring to CLR TVFs (and maybe Procs as well) in general, then there are some places that T-SQL does not cut it: Regular Expressions (especially doing a Replace), File System functions, and Internet related functions (e.g. FTP, Web GET / POST, Twitter, Ping, etc.). There are also some hashing functions (SHA256 and SHA512) that are not available in T-SQL.

Also there are places where T-SQL can work but I am not sure it is worth it. For example, in my SQL# project I have a BusinessDays function that works like DATEDIFF but you can configure it to use any number of "holidays". This way you do not need a "holiday" table that cannot detect Thanksgiving or Easter each year without having new entries added. And yes, that can most likely be done in a T-SQL TVF but given some of the calculations needed to come up with Easter (for both Eastern and Western churches as Easter is not the same day across the planet) it does not seem prudent to put non-set-based math in T-SQL. Not to mention how ugly and unreadable it would be. Another example is the URI class in .Net. I have a customer who was trying to parse the various parts of a URI using complex Regular Expressions but it wasn't very reliable. But the URI class in .Net is reliable and so I exposed that and it works great for him (and possibly many others now).


Matt Miller (#4) (12/23/2009)
I had attempted to use something like this to bring in data from an external file. One of those "ugly delimited" files, kind of delimited and kind of not, that I needed to bring in. This would have offered a way to parse it and clean it up in one single shot, and I would simply need to insert the results from the TVF into my destination table, skipping the whole staging table, clean up the data, etc...

I ended up abandoning it because it was trying to build the entire table in mem before returning it, so the CLR function would crash before getting anything out. So - I ended up going in another direction and got it done that way.


Hey Matt. I would have suggested that for this problem you could have done a CLR Stored Procedure since, as you mentioned, they effectively do stream by returning each row as it is generated. The one thing that I have found that CLR Stored Procs can do that CLR TVFs cannot is return a dynamic result set where the columns aren't known ahead of time and can change between executions. In fact, I have a Stored Proc in my SQL# project that reads a file from disk and splits it into columns using a RegEx delimiter. It is called File_SplitIntoFields. It is not in the Free version but you can see a description of it in the User Manual which is on the Downloads page.

Also, another option if you are using SQL Server 2008 is to create a stand-alone .Net app (or include this in part of an existing application) to stream the data from disk into a T-SQL Stored Proc that accepts a Table-Valued Parameter (TVP). My previous article (http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/) shows how to do this .

Take care,
Solomon...





SQL# - http://www.SQLsharp.com/
Post #838834
Posted Thursday, December 24, 2009 1:00 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:17 AM
Points: 36,800, Visits: 31,261
Thanks for the feedback, Solomon. Glad to see folks using it for smart stuff.

As a side bar, Matt Miller and I had some friendly races a couple/three years ago on Regex and the results came out pretty close in a lot of the cases.

So far as FTP goes, that's not so difficult from a sproc if you can make a trip to the batch world. Same goes with file handling. Heh... yeah, I know... lot's of people can't go there. Never understood that with the likes of proxies, etc. Seems like there's a general paranoia about the Command Prompt these days. Done correctly, it's no worse than "safe" CLR's.

Anyway, nice article and thanks for the feedback.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #838888
Posted Thursday, December 24, 2009 6:02 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, July 29, 2014 12:58 AM
Points: 354, Visits: 1,869
As a side bar, Matt Miller and I had some friendly races a couple/three years ago on Regex and the results came out pretty close in a lot of the cases.


Hey Jeff. So, when you say that you implemented RegEx in T-SQL, you were able to emulate macros (such as \w and \d) or do things like capture groups and look-ahead operators? I know that the LIKE clause can do a very basic RegEx with single-character [a-z,0-9] syntax but not much else. I have seen that .Net RegEx is slightly slower than a LIKE clause but it seems to be 1000 times more powerful so I kinda like having the option to use the full RegEx functionality.


So far as FTP goes, that's not so difficult from a sproc if you can make a trip to the batch world. Same goes with file handling. Heh... yeah, I know... lot's of people can't go there. Never understood that with the likes of proxies, etc. Seems like there's a general paranoia about the Command Prompt these days. Done correctly, it's no worse than "safe" CLR's.


Regarding FTP and even GZip, I took a slightly different approach. In fact, I took a dual approach in which I did the standard FTP a file and GZip a file but then I also have functions that deal with the data directly. By "directly" I mean that you can FTP the contents of a local variable or column from a table. Same with GZip: you can GZip the contents of a variable or column such as:

UPDATE  tab
SET tab.PDFCompressed = SQL#.Util_GZip(tab.PDFData)
FROM MyTable tab
WHERE tab.PDFCompressed IS NULL

Or that could have been a Stored Proc that got the PDF (or JPG / GIF) data passed in as a VARBINARY and that input variable was passed into Util_GZip so that the compressed version was actually stored:

INSERT INTO MyTable (PDFCompressed)
VALUES (SQL#.Util_GZip(@IncomingPDFData))

In this case, you can save room in the DB if you are storing binary data and still mask that it is compressed since any query (via proc, Reporting Services, ad-hoc query, etc.) can simply call:

SELECT   SQL#.Util_GUnzip(tab.PDFCompressed) AS [PDFData]
FROM MyTable tab
WHERE tab.SomeField = @SomeValue

So, I think this way of looking at FTP and GZip offers some interesting options to people that xp_cmdshell implementations cannot. And yes, the new FileStream option in SQL Server 2008 might be even better, but again still good to have options, especially because you can interact with those as VARBINARY columns so these functions still might work the same way against those.

Another thing to consider is that by opening up xp_cmdshell you are giving access to anything that can be called from a command line, whereas in these File System CLR functions the user only has access to the few functions that have been implemented, not everything on the server. To safe guard in xp_cmdshell you would have to restrict the login associated with the SQL Server service to not even be able to enter / read certain directories, but that might then exclude calling "ftp" unless even more fine-grained permissions are applied. And while that is possible, it seems easier to just restrict xp_cmdshell altogether and then selectively GRANT / DENY access to some CLR procs/functions to whatever Logins and/or Roles should be using them.


Anyway, nice article and thanks for the feedback.


Thanks and yer welcome


Take care,
Solomon...





SQL# - http://www.SQLsharp.com/
Post #838950
Posted Thursday, December 24, 2009 6:24 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 11:11 AM
Points: 7,122, Visits: 15,031
Solomon Rutzky (12/23/2009)

Hey Matt. I would have suggested that for this problem you could have done a CLR Stored Procedure since, as you mentioned, they effectively do stream by returning each row as it is generated. The one thing that I have found that CLR Stored Procs can do that CLR TVFs cannot is return a dynamic result set where the columns aren't known ahead of time and can change between executions. In fact, I have a Stored Proc in my SQL# project that reads a file from disk and splits it into columns using a RegEx delimiter. It is called File_SplitIntoFields. It is not in the Free version but you can see a description of it in the User Manual which is on the Downloads page.

Also, another option if you are using SQL Server 2008 is to create a stand-alone .Net app (or include this in part of an existing application) to stream the data from disk into a T-SQL Stored Proc that accepts a Table-Valued Parameter (TVP). My previous article (http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/) shows how to do this .

Take care,
Solomon...


The standalone app was in fact the other method. I was more curious "if I could do it" using SQLCLR (was hoping at the time that it might be faster - this was a few years back). The console app ended up being the fastest way to get what I needed done.



----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #838954
Posted Thursday, December 24, 2009 10:15 AM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Friday, April 4, 2014 4:40 PM
Points: 751, Visits: 917
Jeff Moden (12/23/2009)
I'm not picking on the author here... it's a general statement that I bring up to anyone who may be considering the use of CLRS... Don't use the excuse of a lack of knowledge of T-SQL functionality as a justification for CLR usage. You'll frequently pay (yes, there are exceptions) for such a mistake in the form of performance and resource usage. There are very few things that can't actually be done in T-SQL and there are very few things where a CLR will actually win the performance foot race if the T-SQL is properly written for performance. Even when T-SQL does lose the performance footrace, it's many times so close to the performance of CLR's that it's just not worth the time and effort to maintain a separate code base outside of T-SQL.


A good point and well taken. Still there are some times where CLR can make sense, particularly if you want to do things with Regex for instance.

Also, Itzik Ben-Gan showed that at least for some types of large data sets CLR outperforms T-SQL in running aggregates. There are some details here: http://www.sqlmag.com/articles/index.cfm?articleid=102336 (subscription required.)


---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
Post #839010
Posted Thursday, December 24, 2009 12:02 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, June 5, 2014 10:54 AM
Points: 9,902, Visits: 9,480
Gift Peddie (12/23/2009)
RBarryYoung (12/23/2009)
C# "yield break"? Yikes!

I don't suppose there's a way to implement your example in VB.net, is there? (I don't know of any way to directly emulate yield break in VB.)


No yield is one those C# only definition because per Microsoft there is no valid reason to define it in VB.

I already indicated that I knew that, Gift. Despite that fact that there is no deterministic general conversion for C# yield break, it is quite often possible to work out a case-specific method that can enable the same functionality in VB.net. Again, entirely dependent on the specific case, in this case, Solomon's clever method for streaming CLR-TVF data returns. It would be extremely valuable to me to be able to duplicate this performance in VB.net.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #839044
Posted Thursday, December 24, 2009 12:04 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, June 5, 2014 10:54 AM
Points: 9,902, Visits: 9,480
Jeff Moden (12/23/2009)
Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.

Going back to my previous post... no, I wouldn't use a stored proc if a native TVF would do either (of course, "it depends"). I was just providing a direct method for how to use the output of an existing stored proc.

Actually, Jeff, the line-splitting functions and (less often)parsing functions are usually best done as TVF's, whether SQL or CLR.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #839045
Posted Thursday, December 24, 2009 12:23 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Thursday, June 5, 2014 10:54 AM
Points: 9,902, Visits: 9,480
RBarryYoung (12/24/2009)
Jeff Moden (12/23/2009)
Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.

Going back to my previous post... no, I wouldn't use a stored proc if a native TVF would do either (of course, "it depends"). I was just providing a direct method for how to use the output of an existing stored proc.

Actually, Jeff, the line-splitting functions and (less often)parsing functions are usually best done as TVF's, whether SQL or CLR. Unfortunately, as Matt pointed out, CLR TVF's by default do not want to do a streaming return. Solomon's "trick" here is really quite significant. Up till now, I have not even heard any of the "Big Guns" in CLR's say anything about it other than to complain about the situation.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Post #839046
Posted Thursday, December 24, 2009 12:32 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Sunday, July 27, 2014 12:14 PM
Points: 3,433, Visits: 14,423
RBarryYoung (12/24/2009)
Gift Peddie (12/23/2009)
RBarryYoung (12/23/2009)
C# "yield break"? Yikes!

I don't suppose there's a way to implement your example in VB.net, is there? (I don't know of any way to directly emulate yield break in VB.)


No yield is one those C# only definition because per Microsoft there is no valid reason to define it in VB.

I already indicated that I knew that, Gift. Despite that fact that there is no deterministic general conversion for C# yield break, it is quite often possible to work out a case-specific method that can enable the same functionality in VB.net. Again, entirely dependent on the specific case, in this case, Solomon's clever method for streaming CLR-TVF data returns. It would be extremely valuable to me to be able to duplicate this performance in VB.net.


I did not say you don't know that, I actually wanted to point out the yield being an iterator dependence on C# before your post.


Kind regards,
Gift Peddie
Post #839048
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse