CLR Table-Valued Function Example with Full Streaming (STVF)

  • Comments posted to this topic are about the item CLR Table-Valued Function Example with Full Streaming (STVF)

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Interesting idea. Is there a place where this makes sense or where you want to stream a result set back in your applications?

    Does SSMS do this? It seems that I'll get data on some larger queries while it's still returning other data, or is that just a network/rendering delay as things are transferring?

  • Hey Steve. This is not really about streaming data back to the application; it is really about streaming the results of a TVF back to the calling query, which is typically a Stored Proc or maybe a Job Step or maybe even a SSRS query.

    I really can't think of any reason to not always do this since it keeps memory usage low and performs the same as passing back the entire collection all at once. I am not sure what SSMS does, but I think the delay you are seeing is just part of SQL Server as I have also seen that delay. I think this is where the query hint of "FAST number_rows" comes in as well as the WITH NOWAIT option of RAISERROR in terms of trying to get around that delay.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • ...but I wanted to create TVFs so that I could more easily interact with the output by doing WHERE conditions, GROUP BYs, ORDER BYs, etc. which are not possible with Stored Procedures unless you trap the output into a Table Variable or Temp Table (which I did not want to do).

    Well, not quite true. First, as you probably already know, you can interact with TVF's with WHERE, GROUP BY, ORDER BY, etc. Second, you can use the output of a Stored Procedure rather directly using OPENROWSET:

    SELECT *

    FROM OPENROWSET('SQLOLEDB',

    'server=(local);trusted_connection=yes',

    'set fmtonly off exec sp_who2')

    WHERE SPID >= 50

    ORDER BY Status ASC, CPUTime DESC

    Note that on SQL Server 2005 and up, you must enable 'Ad Hoc Distributed Queries' either through sp_Configure or the "Surface Area Configuration". If necessary, that functionality can be enabled and disabled on the fly in the code when a user with the correct level of permissions uses the code while it rejects those that do not.

    I was quite interested to see what wonderful things could be done that either: a) couldn't be done in regular T-SQL or b) would at least be more efficient than being done in T-SQL.

    I'm not picking on the author here... it's a general statement that I bring up to anyone who may be considering the use of CLRS... Don't use the excuse of a lack of knowledge of T-SQL functionality as a justification for CLR usage. You'll frequently pay (yes, there are exceptions) for such a mistake in the form of performance and resource usage. There are very few things that can't actually be done in T-SQL and there are very few things where a CLR will actually win the performance foot race if the T-SQL is properly written for performance. Even when T-SQL does lose the performance footrace, it's many times so close to the performance of CLR's that it's just not worth the time and effort to maintain a separate code base outside of T-SQL.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ...but I wanted to create TVFs so that I could more easily interact with the output by doing WHERE conditions, GROUP BYs, ORDER BYs, etc.

    Well, not quite true. First, as you probably already know, you can interact with TVF's with WHERE, GROUP BY, ORDER BY, etc.

    Hi Jeff. I believe we are saying the same thing here. My point was expressing a preference for interacting with TVFs over Stored Procs. Maybe it would have been clearer if I said CLR Stored Procs. The point of what I wrote was that I feel many / most people are aware that CLR Stored Procs can send each row back as it is created but not everyone is aware of how to do that via TVFs.

    which are not possible with Stored Procedures unless you trap the output into a Table Variable or Temp Table (which I did not want to do).

    Second, you can use the output of a Stored Procedure rather directly using OPENROWSET:

    Thank you for this information. I was not aware of using OPENROWSET like this so it is definitely good to know. Although personally, using a TVF seems like a cleaner / less clunky interface. Again, the point is that if one is already doing something in CLR, it is good to have the option of creating a Function over a Procedure.

    I'm not picking on the author here... it's a general statement that I bring up to anyone who may be considering the use of CLRS... Don't use the excuse of a lack of knowledge of T-SQL functionality as a justification for CLR usage.

    I generally agree with this statement and was not advocating that people use SQLCLR simply for the sake of using it or in doing something that is natively available in T-SQL. But, when a good use for SQLCLR is found it is good to not be forced into the Stored Procedure interface.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Hi Solomon -

    This is definitely an interesting solution. I had noticed that ugliness of having to wait for the whole recordset to render before you could send it downrange with CLR TVF's, so I will take a look at this one for sure next time it comes up.

    Besides the speed - I'd anticipate this also reduces the memory footprint as well - have you had any exposure to this (it should be smaller since the entire recordset doesn't need to live in memory beofre it starts being sent)?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Besides the speed - I'd anticipate this also reduces the memory footprint as well - have you had any exposure to this (it should be smaller since the entire recordset doesn't need to live in memory beofre it starts being sent)?

    Hey Matt #4 ;-), thanks. Actually, the main benefit of using "yield return" -- or even the full IEnumerator syntax -- is in reducing the memory footprint. In fact, in my testing I did not see much of a performance gain at all. When I tried the example code shown in the article with a value of 8,000,000 -- which did work in the "standard" method and hence I could compare the timing of each -- both solutions returned in 4 seconds no matter how many times I ran it. I would have thought that there would be some performance gain but even if the end result is that performance is the same at least we have a very-low memory usage gain.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • C# "[font="Courier New"]yield break[/font]"? Yikes!

    I don't suppose there's a way to implement your example in VB.net, is there? (I don't know of any way to directly emulate [font="Courier New"]yield break[/font] in VB.)

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (12/23/2009)


    C# "[font="Courier New"]yield break[/font]"? Yikes!

    I don't suppose there's a way to implement your example in VB.net, is there? (I don't know of any way to directly emulate [font="Courier New"]yield break[/font] in VB.)

    No yield is one those C# only definition because per Microsoft there is no valid reason to define it in VB.

    Kind regards,
    Gift Peddie

  • Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.

    Going back to my previous post... no, I wouldn't use a stored proc if a native TVF would do either (of course, "it depends"). I was just providing a direct method for how to use the output of an existing stored proc.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I had attempted to use something like this to bring in data from an external file. One of those "ugly delimited" files, kind of delimited and kind of not, that I needed to bring in. This would have offered a way to parse it and clean it up in one single shot, and I would simply need to insert the results from the TVF into my destination table, skipping the whole staging table, clean up the data, etc...

    I ended up abandoning it because it was trying to build the entire table in mem before returning it, so the CLR function would crash before getting anything out. So - I ended up going in another direction and got it done that way.

    I had also looked at building the old running totals this way, running into the same problem (order is in fact guaranteed this way so no controversy over documented vs not). I got around that using the CLR SP (since you can stream the results without having to build the entire return set first), but still use the "other way" until I am formally forbidden from doing so , since it is still faster than the CLR SP (although not by a whole lot).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden (12/23/2009)


    Next question, please... What application would have the need for such a thing where it could not be done using T-SQL? Not trying to be a smart guy here... I really want to know where folks have used or will use this type of thing and why they couldn't do it in T-SQL.

    Hey Jeff. If by "such a thing" you are referring to CLR TVFs (and maybe Procs as well) in general, then there are some places that T-SQL does not cut it: Regular Expressions (especially doing a Replace), File System functions, and Internet related functions (e.g. FTP, Web GET / POST, Twitter, Ping, etc.). There are also some hashing functions (SHA256 and SHA512) that are not available in T-SQL.

    Also there are places where T-SQL can work but I am not sure it is worth it. For example, in my SQL# project I have a BusinessDays function that works like DATEDIFF but you can configure it to use any number of "holidays". This way you do not need a "holiday" table that cannot detect Thanksgiving or Easter each year without having new entries added. And yes, that can most likely be done in a T-SQL TVF but given some of the calculations needed to come up with Easter (for both Eastern and Western churches as Easter is not the same day across the planet) it does not seem prudent to put non-set-based math in T-SQL. Not to mention how ugly and unreadable it would be. Another example is the URI class in .Net. I have a customer who was trying to parse the various parts of a URI using complex Regular Expressions but it wasn't very reliable. But the URI class in .Net is reliable and so I exposed that and it works great for him (and possibly many others now).

    Matt Miller (#4) (12/23/2009)


    I had attempted to use something like this to bring in data from an external file. One of those "ugly delimited" files, kind of delimited and kind of not, that I needed to bring in. This would have offered a way to parse it and clean it up in one single shot, and I would simply need to insert the results from the TVF into my destination table, skipping the whole staging table, clean up the data, etc...

    I ended up abandoning it because it was trying to build the entire table in mem before returning it, so the CLR function would crash before getting anything out. So - I ended up going in another direction and got it done that way.

    Hey Matt. I would have suggested that for this problem you could have done a CLR Stored Procedure since, as you mentioned, they effectively do stream by returning each row as it is generated. The one thing that I have found that CLR Stored Procs can do that CLR TVFs cannot is return a dynamic result set where the columns aren't known ahead of time and can change between executions. In fact, I have a Stored Proc in my SQL# project that reads a file from disk and splits it into columns using a RegEx delimiter. It is called File_SplitIntoFields. It is not in the Free version but you can see a description of it in the User Manual which is on the Downloads page.

    Also, another option if you are using SQL Server 2008 is to create a stand-alone .Net app (or include this in part of an existing application) to stream the data from disk into a T-SQL Stored Proc that accepts a Table-Valued Parameter (TVP). My previous article (http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/) shows how to do this :-).

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Thanks for the feedback, Solomon. Glad to see folks using it for smart stuff.

    As a side bar, Matt Miller and I had some friendly races a couple/three years ago on Regex and the results came out pretty close in a lot of the cases.

    So far as FTP goes, that's not so difficult from a sproc if you can make a trip to the batch world. Same goes with file handling. Heh... yeah, I know... lot's of people can't go there. Never understood that with the likes of proxies, etc. Seems like there's a general paranoia about the Command Prompt these days. Done correctly, it's no worse than "safe" CLR's.

    Anyway, nice article and thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As a side bar, Matt Miller and I had some friendly races a couple/three years ago on Regex and the results came out pretty close in a lot of the cases.

    Hey Jeff. So, when you say that you implemented RegEx in T-SQL, you were able to emulate macros (such as \w and \d) or do things like capture groups and look-ahead operators? I know that the LIKE clause can do a very basic RegEx with single-character [a-z,0-9] syntax but not much else. I have seen that .Net RegEx is slightly slower than a LIKE clause but it seems to be 1000 times more powerful so I kinda like having the option to use the full RegEx functionality.

    So far as FTP goes, that's not so difficult from a sproc if you can make a trip to the batch world. Same goes with file handling. Heh... yeah, I know... lot's of people can't go there. Never understood that with the likes of proxies, etc. Seems like there's a general paranoia about the Command Prompt these days. Done correctly, it's no worse than "safe" CLR's.

    Regarding FTP and even GZip, I took a slightly different approach. In fact, I took a dual approach in which I did the standard FTP a file and GZip a file but then I also have functions that deal with the data directly. By "directly" I mean that you can FTP the contents of a local variable or column from a table. Same with GZip: you can GZip the contents of a variable or column such as:

    UPDATE tab

    SET tab.PDFCompressed = SQL#.Util_GZip(tab.PDFData)

    FROM MyTable tab

    WHERE tab.PDFCompressed IS NULL

    Or that could have been a Stored Proc that got the PDF (or JPG / GIF) data passed in as a VARBINARY and that input variable was passed into Util_GZip so that the compressed version was actually stored:

    INSERT INTO MyTable (PDFCompressed)

    VALUES (SQL#.Util_GZip(@IncomingPDFData))

    In this case, you can save room in the DB if you are storing binary data and still mask that it is compressed since any query (via proc, Reporting Services, ad-hoc query, etc.) can simply call:

    SELECT SQL#.Util_GUnzip(tab.PDFCompressed) AS [PDFData]

    FROM MyTable tab

    WHERE tab.SomeField = @SomeValue

    So, I think this way of looking at FTP and GZip offers some interesting options to people that xp_cmdshell implementations cannot. And yes, the new FileStream option in SQL Server 2008 might be even better, but again still good to have options, especially because you can interact with those as VARBINARY columns so these functions still might work the same way against those.

    Another thing to consider is that by opening up xp_cmdshell you are giving access to anything that can be called from a command line, whereas in these File System CLR functions the user only has access to the few functions that have been implemented, not everything on the server. To safe guard in xp_cmdshell you would have to restrict the login associated with the SQL Server service to not even be able to enter / read certain directories, but that might then exclude calling "ftp" unless even more fine-grained permissions are applied. And while that is possible, it seems easier to just restrict xp_cmdshell altogether and then selectively GRANT / DENY access to some CLR procs/functions to whatever Logins and/or Roles should be using them.

    Anyway, nice article and thanks for the feedback.

    Thanks and yer welcome 🙂

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Solomon Rutzky (12/23/2009)


    Hey Matt. I would have suggested that for this problem you could have done a CLR Stored Procedure since, as you mentioned, they effectively do stream by returning each row as it is generated. The one thing that I have found that CLR Stored Procs can do that CLR TVFs cannot is return a dynamic result set where the columns aren't known ahead of time and can change between executions. In fact, I have a Stored Proc in my SQL# project that reads a file from disk and splits it into columns using a RegEx delimiter. It is called File_SplitIntoFields. It is not in the Free version but you can see a description of it in the User Manual which is on the Downloads page.

    Also, another option if you are using SQL Server 2008 is to create a stand-alone .Net app (or include this in part of an existing application) to stream the data from disk into a T-SQL Stored Proc that accepts a Table-Valued Parameter (TVP). My previous article (http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/) shows how to do this :-).

    Take care,

    Solomon...

    The standalone app was in fact the other method. I was more curious "if I could do it" using SQLCLR (was hoping at the time that it might be faster - this was a few years back). The console app ended up being the fastest way to get what I needed done.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 15 (of 60 total)

You must be logged in to reply to this topic. Login to reply