Is there any justification for really using SQL CLR

  • Paul White

    SSC Guru

    Points: 150341

    Adam Machanic (2/5/2010)


    Your example shows that an aggregate can aggregate faster than a stored procedure, but is that really surprising? The stored procedure needs to do data access over the context connection, whereas the aggregate simply takes values directly from the query processor. What I expected when you mentioned this was some kind of streaming example where the procedure and aggregate would each spit out a set of rows that had been manipulated in some way--some kind of scenario where someone might actually consider whether to use an aggregate or a procedure.

    Well no it isn't surprising at all, but that was kinda my original point 🙂

    You asked why I said aggregates read rows faster than the context connection, and what test I had run.

    My post was to address that really.

    Paul

  • Paul White

    SSC Guru

    Points: 150341

    Adam Machanic (2/5/2010)


    This is certainly an interesting and promising idea and I'll have to play with it a bit. I'm not sure, however, whether it helps to solve windowing problems, which is what I had in mind for ordered aggregates. If you need a 1:1 "aggregation," like a running sum, you have to group by something distinct like the row number. But if you do that the QP will spin up a new instance of the aggregate per row, which means you won't have access to the previous row's value. And if you group by anything else you'll have fewer rows in the output than you will in the input. Have you gotten around that somehow? Or do you have some other class of problem where you're able to leverage this?

    This is what my code did (in very broad outline only for space reasons):

    I had a UDA called dbo.Pack which took three parameters - the Primary Key of the source record, the Sequence number generated by the Row_Number, and the data value to be running-summed.

    This UDA was called once per group. So, N records of transactions went in to producing one aggregated output value.

    Taking advantage of 2008's ability to return VARBINARY(MAX) from a UDA, I stored the PK and data value in an highly compacted form inside that SqlBytes output. (I got it down to under three bytes per row for some data sets.) The sequence number is just there to check that Accumulate receives rows in the right order, so it doesn't need to be stored.

    This SqlBytes object was passed to a CLR TVF (dbo.Unpack) as input. The byte-stuffed format was expanded as needed inside the MoveNext method, and a running sum computed at that stage.

    The output streams via the FillRow method as normal. The output contains the Primary Key, in case a join back to the original data is required for any reason.

    I am missing a lot of detail out, so please try to read between the lines a little there. I actually wrote several different versions with different approaches, but the description above represents the core of it.

    The obvious concern is with memory usage for very large input data sets. Hence my interest in spilling to disk or whatever. Again, the details don't really matter.

    The things that stopped me publishing the article are:

    1. The wrong thread exception when passing my SqlBytes from the UDA's Terminate method to the TVF's input. I could get around this by reading the stream's buffer into a byte array, but that's not at all nice from a memory allocation point of view.

    2. The perf spool that you get when APPLYing the UDA output to the TVF input!!! Ok so I could try the trace flag to disable it, but that's not fun either. You only get a perf spool if the UDA produces more than one group, so for single group runs it flies! The perf spool hurts a lot for larger numbers of groups though.

    I should mention though that the whole thing was still very fast. It is very nearly as fast as Jeff and Hugo's methods, which was quite something considering all that work. (There was actually even more work than I mentioned since I used streaming compression in the UDA and decompression in the TVF for large sets).

    So I will wait for the wrong thread thing to be fixed, maybe enable the trace flag, and go with that for the time being. I hope you can see why I am so keen on a streaming-in-and-out CLR combination UDA and TVF object?!

    Paul

  • Paul White

    SSC Guru

    Points: 150341

    Adam Machanic (2/5/2010)


    Agreed, but don't you think the SQLCLR hosting environment should handle this itself and spill to tempdb just like everything else in SQL Server does? Pedro, are you listening? 🙂

    I do indeed - but it doesn't do that today, and I wanted my article to be as robust as possible for very large sets. Spilling to Isolated Storage seemed an idea worth exploring. To be fair, it was fun, it just would have been nice to be able to present a finished product, that did LEAD and LAG and well as running sums.

    Paul

  • Paul White

    SSC Guru

    Points: 150341

    I opened a Connect suggestion of my own, for CLR user-defined analytic functions (streaming multiple rows input and output thingies :-D).

    https://connect.microsoft.com/SQLServer/feedback/details/530387/sqlclr-user-defined-analytic-functions

    I would hope to encourage some casual readers of this thread to vote for it - thanks!

    Paul

  • Pedro DeRose [MSFT]

    Old Hand

    Points: 307

    Paul White (2/5/2010)


    Pedro DeRose [MSFT] (1/22/2010)


    Which, I think, may be a good slogan for SQLCLR: better than cutting boards with a hammer. 🙂

    Do you mind if I adopt that gem into my signature? Love it.

    You are welcome to it. 🙂


    Pedro DeRose
    Program Manager, Microsoft SQL Server

    (Legalese: This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.)

  • Pedro DeRose [MSFT]

    Old Hand

    Points: 307

    Adam Machanic (2/5/2010)Agreed, but don't you think the SQLCLR hosting environment should handle this itself and spill to tempdb just like everything else in SQL Server does? Pedro, are you listening? 🙂

    I am indeed. I presume there's already a Connect item with your vote. 🙂


    Pedro DeRose
    Program Manager, Microsoft SQL Server

    (Legalese: This posting is provided "AS IS" with no warranties, and confers no rights. Use of included script samples are subject to the terms specified at http://www.microsoft.com/info/cpyright.htm.)

  • RBarryYoung

    SSC Guru

    Points: 143327

    Paul White (2/4/2010)


    ...Now that a UDA can accept multiple parameters, I can write: SELECT dbo.UDA(@value, @row_number) and handle ordering issues inside the Accumulate method. The ranking function will tend to order the rows, and while I can't rely on that order being preserved, I can optimize for it...

    You know I actually wrote a lot of these same things last summer (right before I got sick), including the "packing everything into one big varbinary" trick, but it looks like you've gotten a lot farther than I did, including fixing some of the mysterious problems i was puzzled over. Oh well, ... 🙂

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Paul White

    SSC Guru

    Points: 150341

    RBarryYoung (2/6/2010)


    You know I actually wrote a lot of these same things last summer (right before I got sick), including the "packing everything into one big varbinary" trick, but it looks like you've gotten a lot farther than I did, including fixing some of the mysterious problems i was puzzled over. Oh well, ... 🙂

    Hi Barry!

    Yes, the new 2008 features were just begging for something like that approach weren't they? I was lucky to get a bit of time to finally work on it recently. Of course it would have been better if things had worked out perfectly, but I might still get some use out of it, and it was a lot of fun in any case. Please feel free to share any thoughts or questions you might have - you've still got my email address right?

    Paul

  • Paul White

    SSC Guru

    Points: 150341

    Adam Machanic (2/5/2010)


    What I expected when you mentioned this was some kind of streaming example where the procedure and aggregate would each spit out a set of rows that had been manipulated in some way--some kind of scenario where someone might actually consider whether to use an aggregate or a procedure.

    Just to add to my previous comments. As part of the scheme I outlined for doing running sums and other analytic functions, I did do some performance testing of the UDA + TVF method versus a 'traditional' 2005 procedure, quirky update, optimized cursor, and Hugo's innovative set-based iteration method.

    The tests were closely based on your old blog entry http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/running-sums-yet-again-sqlclr-saves-the-day.aspx. The results (on different set sizes from 5,000 to 1 million rows) showed that the 2008 method was always at least twice as fast as the 2005 procedure. That's the only real testing I've done. I know that a pure CLR TVF solution (IEnumerable data reader) would likely be faster than the procedure, but I didn't get that far.

  • Alan Burstein

    SSC Guru

    Points: 61026

    Old thread, I know but there is something that has not been discussed here that I think is super important. Correctly designed CLR scalar UDFs don't have the problems that we have with T-SQL scalar UDFs. As we all know - T-SQL scalar UDFs kill parallelism; you can, however, write a CLR scalar UDF that gets a parallel plan.

    You can't use inline T-SQL functions as constraints or computed columns, only T-SQL scalar UDFs. Once you add a T-SQL scalar UDFs as a computed column or for a constraint, queries that reference that table (or view) often get really slow especially when the optimizer prefers a parallel plan. This is not always true with CLR UDFs (provided that they are inline, immutable, etc.)

    SQL Server inline scalar UDFs in 2019 are a huge improvement but, on pre-2019 systems, if you want to use a scalar UDF for constraints, computed columns or in a view then a well written CLR is the only game in town.

    Another Hidden Parallelism Killer: Scalar UDFs In Check Constraints

    Another reason why scalar functions in computed columns is a bad idea

    -- Alan Burstein


    Helpful links:Best practices for getting help on SQLServerCentral -- Jeff ModenHow to Post Performance Problems -- Gail ShawNasty fast set-based string manipulation functions:For splitting strings try DelimitedSplit8K or DelimitedSplit8K_LEAD (SQL Server 2012+)To split strings based on patterns try PatternSplitCMNeed to clean or transform a string? try NGrams, PatExclude8K, PatReplace8K, DigitsOnlyEE, or Translate8KI cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code.  -- Itzik Ben-Gan 2001

Viewing 10 posts - 46 through 55 (of 55 total)

You must be logged in to reply to this topic. Login to reply