Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


CLR Table-Valued Function Example with Full Streaming (STVF)


CLR Table-Valued Function Example with Full Streaming (STVF)

Author
Message
Matt Miller (#4)
Matt Miller (#4)
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8291 Visits: 18261
I had attempted to use something like this to bring in data from an external file. One of those "ugly delimited" files, kind of delimited and kind of not, that I needed to bring in. This would have offered a way to parse it and clean it up in one single shot, and I would simply need to insert the results from the TVF into my destination table, skipping the whole staging table, clean up the data, etc...

I ended up abandoning it because it was trying to build the entire table in mem before returning it, so the CLR function would crash before getting anything out. So - I ended up going in another direction and got it done that way.

I had also looked at building the old running totals this way, running into the same problem (order is in fact guaranteed this way so no controversy over documented vs not). I got around that using the CLR SP (since you can stream the results without having to build the entire return set first), but still use the "other way" until I am formally forbidden from doing so , since it is still faster than the CLR SP (although not by a whole lot).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Solomon Rutzky
Solomon Rutzky
SSC Eights!
SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)

Group: General Forum Members
Points: 902 Visits: 2946
Jeff Moden (12/23/2009)
Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.


Hey Jeff. If by "such a thing" you are referring to CLR TVFs (and maybe Procs as well) in general, then there are some places that T-SQL does not cut it: Regular Expressions (especially doing a Replace), File System functions, and Internet related functions (e.g. FTP, Web GET / POST, Twitter, Ping, etc.). There are also some hashing functions (SHA256 and SHA512) that are not available in T-SQL.

Also there are places where T-SQL can work but I am not sure it is worth it. For example, in my SQL# project I have a BusinessDays function that works like DATEDIFF but you can configure it to use any number of "holidays". This way you do not need a "holiday" table that cannot detect Thanksgiving or Easter each year without having new entries added. And yes, that can most likely be done in a T-SQL TVF but given some of the calculations needed to come up with Easter (for both Eastern and Western churches as Easter is not the same day across the planet) it does not seem prudent to put non-set-based math in T-SQL. Not to mention how ugly and unreadable it would be. Another example is the URI class in .Net. I have a customer who was trying to parse the various parts of a URI using complex Regular Expressions but it wasn't very reliable. But the URI class in .Net is reliable and so I exposed that and it works great for him (and possibly many others now).


Matt Miller (#4) (12/23/2009)
I had attempted to use something like this to bring in data from an external file. One of those "ugly delimited" files, kind of delimited and kind of not, that I needed to bring in. This would have offered a way to parse it and clean it up in one single shot, and I would simply need to insert the results from the TVF into my destination table, skipping the whole staging table, clean up the data, etc...

I ended up abandoning it because it was trying to build the entire table in mem before returning it, so the CLR function would crash before getting anything out. So - I ended up going in another direction and got it done that way.


Hey Matt. I would have suggested that for this problem you could have done a CLR Stored Procedure since, as you mentioned, they effectively do stream by returning each row as it is generated. The one thing that I have found that CLR Stored Procs can do that CLR TVFs cannot is return a dynamic result set where the columns aren't known ahead of time and can change between executions. In fact, I have a Stored Proc in my SQL# project that reads a file from disk and splits it into columns using a RegEx delimiter. It is called File_SplitIntoFields. It is not in the Free version but you can see a description of it in the User Manual which is on the Downloads page.

Also, another option if you are using SQL Server 2008 is to create a stand-alone .Net app (or include this in part of an existing application) to stream the data from disk into a T-SQL Stored Proc that accepts a Table-Valued Parameter (TVP). My previous article (http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/) shows how to do this :-).

Take care,
Solomon...

SQL# - http://www.SQLsharp.com/
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)SSC Guru (51K reputation)

Group: General Forum Members
Points: 51957 Visits: 40309
Thanks for the feedback, Solomon. Glad to see folks using it for smart stuff.

As a side bar, Matt Miller and I had some friendly races a couple/three years ago on Regex and the results came out pretty close in a lot of the cases.

So far as FTP goes, that's not so difficult from a sproc if you can make a trip to the batch world. Same goes with file handling. Heh... yeah, I know... lot's of people can't go there. Never understood that with the likes of proxies, etc. Seems like there's a general paranoia about the Command Prompt these days. Done correctly, it's no worse than "safe" CLR's.

Anyway, nice article and thanks for the feedback.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Solomon Rutzky
Solomon Rutzky
SSC Eights!
SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)SSC Eights! (902 reputation)

Group: General Forum Members
Points: 902 Visits: 2946
As a side bar, Matt Miller and I had some friendly races a couple/three years ago on Regex and the results came out pretty close in a lot of the cases.


Hey Jeff. So, when you say that you implemented RegEx in T-SQL, you were able to emulate macros (such as \w and \d) or do things like capture groups and look-ahead operators? I know that the LIKE clause can do a very basic RegEx with single-character [a-z,0-9] syntax but not much else. I have seen that .Net RegEx is slightly slower than a LIKE clause but it seems to be 1000 times more powerful so I kinda like having the option to use the full RegEx functionality.


So far as FTP goes, that's not so difficult from a sproc if you can make a trip to the batch world. Same goes with file handling. Heh... yeah, I know... lot's of people can't go there. Never understood that with the likes of proxies, etc. Seems like there's a general paranoia about the Command Prompt these days. Done correctly, it's no worse than "safe" CLR's.


Regarding FTP and even GZip, I took a slightly different approach. In fact, I took a dual approach in which I did the standard FTP a file and GZip a file but then I also have functions that deal with the data directly. By "directly" I mean that you can FTP the contents of a local variable or column from a table. Same with GZip: you can GZip the contents of a variable or column such as:


UPDATE tab
SET tab.PDFCompressed = SQL#.Util_GZip(tab.PDFData)
FROM MyTable tab
WHERE tab.PDFCompressed IS NULL



Or that could have been a Stored Proc that got the PDF (or JPG / GIF) data passed in as a VARBINARY and that input variable was passed into Util_GZip so that the compressed version was actually stored:


INSERT INTO MyTable (PDFCompressed)
VALUES (SQL#.Util_GZip(@IncomingPDFData))



In this case, you can save room in the DB if you are storing binary data and still mask that it is compressed since any query (via proc, Reporting Services, ad-hoc query, etc.) can simply call:


SELECT SQL#.Util_GUnzip(tab.PDFCompressed) AS [PDFData]
FROM MyTable tab
WHERE tab.SomeField = @SomeValue



So, I think this way of looking at FTP and GZip offers some interesting options to people that xp_cmdshell implementations cannot. And yes, the new FileStream option in SQL Server 2008 might be even better, but again still good to have options, especially because you can interact with those as VARBINARY columns so these functions still might work the same way against those.

Another thing to consider is that by opening up xp_cmdshell you are giving access to anything that can be called from a command line, whereas in these File System CLR functions the user only has access to the few functions that have been implemented, not everything on the server. To safe guard in xp_cmdshell you would have to restrict the login associated with the SQL Server service to not even be able to enter / read certain directories, but that might then exclude calling "ftp" unless even more fine-grained permissions are applied. And while that is possible, it seems easier to just restrict xp_cmdshell altogether and then selectively GRANT / DENY access to some CLR procs/functions to whatever Logins and/or Roles should be using them.


Anyway, nice article and thanks for the feedback.


Thanks and yer welcome :-)


Take care,
Solomon...

SQL# - http://www.SQLsharp.com/
Matt Miller (#4)
Matt Miller (#4)
SSCrazy Eights
SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)SSCrazy Eights (8.3K reputation)

Group: General Forum Members
Points: 8291 Visits: 18261
Solomon Rutzky (12/23/2009)

Hey Matt. I would have suggested that for this problem you could have done a CLR Stored Procedure since, as you mentioned, they effectively do stream by returning each row as it is generated. The one thing that I have found that CLR Stored Procs can do that CLR TVFs cannot is return a dynamic result set where the columns aren't known ahead of time and can change between executions. In fact, I have a Stored Proc in my SQL# project that reads a file from disk and splits it into columns using a RegEx delimiter. It is called File_SplitIntoFields. It is not in the Free version but you can see a description of it in the User Manual which is on the Downloads page.

Also, another option if you are using SQL Server 2008 is to create a stand-alone .Net app (or include this in part of an existing application) to stream the data from disk into a T-SQL Stored Proc that accepts a Table-Valued Parameter (TVP). My previous article (http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/) shows how to do this :-).

Take care,
Solomon...


The standalone app was in fact the other method. I was more curious "if I could do it" using SQLCLR (was hoping at the time that it might be faster - this was a few years back). The console app ended up being the fastest way to get what I needed done.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
timothyawiseman
timothyawiseman
SSC Eights!
SSC Eights! (872 reputation)SSC Eights! (872 reputation)SSC Eights! (872 reputation)SSC Eights! (872 reputation)SSC Eights! (872 reputation)SSC Eights! (872 reputation)SSC Eights! (872 reputation)SSC Eights! (872 reputation)

Group: General Forum Members
Points: 872 Visits: 920
Jeff Moden (12/23/2009)
I'm not picking on the author here... it's a general statement that I bring up to anyone who may be considering the use of CLRS... Don't use the excuse of a lack of knowledge of T-SQL functionality as a justification for CLR usage. You'll frequently pay (yes, there are exceptions) for such a mistake in the form of performance and resource usage. There are very few things that can't actually be done in T-SQL and there are very few things where a CLR will actually win the performance foot race if the T-SQL is properly written for performance. Even when T-SQL does lose the performance footrace, it's many times so close to the performance of CLR's that it's just not worth the time and effort to maintain a separate code base outside of T-SQL.


A good point and well taken. Still there are some times where CLR can make sense, particularly if you want to do things with Regex for instance.

Also, Itzik Ben-Gan showed that at least for some types of large data sets CLR outperforms T-SQL in running aggregates. There are some details here: http://www.sqlmag.com/articles/index.cfm?articleid=102336 (subscription required.)

---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
RBarryYoung
RBarryYoung
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10120 Visits: 9517
Gift Peddie (12/23/2009)
RBarryYoung (12/23/2009)
C# "yield break"? Yikes!

I don't suppose there's a way to implement your example in VB.net, is there? (I don't know of any way to directly emulate yield break in VB.)


No yield is one those C# only definition because per Microsoft there is no valid reason to define it in VB.

I already indicated that I knew that, Gift. Despite that fact that there is no deterministic general conversion for C# yield break, it is quite often possible to work out a case-specific method that can enable the same functionality in VB.net. Again, entirely dependent on the specific case, in this case, Solomon's clever method for streaming CLR-TVF data returns. It would be extremely valuable to me to be able to duplicate this performance in VB.net.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RBarryYoung
RBarryYoung
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10120 Visits: 9517
Jeff Moden (12/23/2009)
Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.

Going back to my previous post... no, I wouldn't use a stored proc if a native TVF would do either (of course, "it depends"). I was just providing a direct method for how to use the output of an existing stored proc.

Actually, Jeff, the line-splitting functions and (less often)parsing functions are usually best done as TVF's, whether SQL or CLR.

-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
RBarryYoung
RBarryYoung
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10120 Visits: 9517
RBarryYoung (12/24/2009)
Jeff Moden (12/23/2009)
Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.

Going back to my previous post... no, I wouldn't use a stored proc if a native TVF would do either (of course, "it depends"). I was just providing a direct method for how to use the output of an existing stored proc.

Actually, Jeff, the line-splitting functions and (less often)parsing functions are usually best done as TVF's, whether SQL or CLR. Unfortunately, as Matt pointed out, CLR TVF's by default do not want to do a streaming return. Solomon's "trick" here is really quite significant. Up till now, I have not even heard any of the "Big Guns" in CLR's say anything about it other than to complain about the situation.


-- RBarryYoung, (302)375-0451 blog: MovingSQL.com, Twitter: @RBarryYoung
Proactive Performance Solutions, Inc.
"Performance is our middle name."
Gift Peddie
Gift Peddie
Hall of Fame
Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)Hall of Fame (3.9K reputation)

Group: General Forum Members
Points: 3890 Visits: 14456
RBarryYoung (12/24/2009)
Gift Peddie (12/23/2009)
RBarryYoung (12/23/2009)
C# "yield break"? Yikes!

I don't suppose there's a way to implement your example in VB.net, is there? (I don't know of any way to directly emulate yield break in VB.)


No yield is one those C# only definition because per Microsoft there is no valid reason to define it in VB.

I already indicated that I knew that, Gift. Despite that fact that there is no deterministic general conversion for C# yield break, it is quite often possible to work out a case-specific method that can enable the same functionality in VB.net. Again, entirely dependent on the specific case, in this case, Solomon's clever method for streaming CLR-TVF data returns. It would be extremely valuable to me to be able to duplicate this performance in VB.net.


I did not say you don't know that, I actually wanted to point out the yield being an iterator dependence on C# before your post.

Kind regards,
Gift Peddie
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search