Sending a Variable Number of Elements in a Parameter

  • ChrisM@home

    SSC-Insane

    Points: 24260

    Peter E. Kierstead (2/9/2014)


    One of my first attempts at splitting delimited data in a column used the XML shred method. Even on relatively small tables (by row count) the queries took forever due to the poor performance of Microsoft's XML implementation and the "shred" operation. A CTE-based UDF performed way better, however, the the CROSS APPLY subquery stack method I ended up with (see my previous comment) beat them all, hands down. You may not care for the apparent code complexity, but my employers sure appreciated the significant performance gains.

    The "subquery stack" method is better known as the (CROSS) APPLY cascade[/url]. It works very well at string splits for a small number of elements when the maximum number of elements is fixed.

    Here's how to get "nondeterministic things done" in an APPLY block.


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Peter E. Kierstead

    SSCarpal Tunnel

    Points: 4114

    Thanks Chris,

    I'm familiar with this technique however it only applies to a certain category of non-determinist problems; namely those where the unknown quantity of elements is in the provided row set (in your example the Values()-sourced FROM clause).

    I was hoping for a more generalized solution that, in keeping with the previously provided examples, i.e., parsing delimited strings, would allow one to parse an unknown number of delimited elements from the parent column passed in to the CROSS APPLY looking for certain values in certain positions and returning a column indicating compliance.

    One solution converts, for the sake of discussion, a pipe-delimited string into an XML string then uses XQuery to check for the required compliance. While this does work it is abysmally slow even over relatively small parent row sets.

    And again, even more generally, I was trying to find a way to use a CTE in the CROSS APPLY having the parent column somehow "passed" to it. So far I haven't be successful. It might be that you know of a way to accomplish this (see the end of my first post)?



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • ChrisM@home

    SSC-Insane

    Points: 24260

    Not sure what you mean Peter - a CTE is just a macro, a view on the fly. Can you construct your idea into pseudocode, it might make it easier for folks to visualise?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Peter E. Kierstead

    SSCarpal Tunnel

    Points: 4114

    Sorry for the late reply I've been working around the house and missed your response.

    Please allow me a little setup:

    Most SQL programmers will tell you that there are two types of CTEs: recursive and non-recursive. While I don't disagree with this assessment I would add a third type (or at the very least a variation on the two formally accepted types): the parametrized CTE. I won't burden you with the specifics as I'm sure you've used the two common CTE forms, and maybe even this third form/variation, which follows:

    Set NoCount On;

    Declare @Span Int;

    Set @Span=5;

    ;With ZeroCrossingRange1(n) as

    (

    Select -@Span

    Union All

    Select n+1 from ZeroCrossingRange1 where n<@Span

    )

    Select * from ZeroCrossingRange1;

    ;With RowSource(n) as

    (

    Select

    -(Select Count(*) from sys.columns)/2+

    Row_Number() over (order by (Select Null))

    from sys.columns

    ),

    ZeroCrossingRange2(n) as

    (

    Select * from RowSource where n between -@Span and @Span

    )

    Select * from ZeroCrossingRange2

    There is no way for the first CTE to generate its row set without the @ variable mediating it's initialization and recursion. The second CTE uses a standard where clause predicate in the CTE instead of on the select statement referencing the CTE. I'm not push this methodology, just show its implementation for my next idea.

    So, basically you can get information into your CTEs in one of three ways (or a combination of the three): from table sources, from @ variables, or from constants (Select 1 Union all Select 2 Union All ...).

    So, with the above in mind, I would like to do something like this:

    Set NoCount On;

    ;With CTE(Validated) as

    (

    --Select 'Yes'

    Select t.Options

    )

    Select top 10

    IsNull

    (

    (

    Select top (RandomNoOfColumns)

    Case Row_Number() over (order by (Select Null)) -- Ordinal column number

    when 1 then ''

    else '*'

    End+

    name

    from sys.columns

    where Abs(CheckSum(NewId()))%25=0

    for xml path('')

    ),

    ''

    )[Options],

    Abs(CheckSum(NewId()))%25+1[OptionToCheck],

    ca.Validated

    from

    (

    Select

    (Row_Number() over (order by (Select Null))+Abs(CheckSum(NewId())))%25[RandomNoOfColumns]

    from sys.columns -- Just used as a generic row source for column RandomNoOfColumns

    ) t

    Cross Apply

    (

    Select * from CTE

    ) ca

    The code up to the CROSS APPLY simply generates some representative test data. Look at the data values as an unknown number of '*' delimited (pipes are too hard to see) option parameters. The idea being to check the nth option parameter, where n is another column in the parent table, for a specific value or set of values.

    The above coded CTE is about a simple as I can make it and still expose the problem I'm trying to solve (reverse the comments in the CTE and you'll get a mocked output of what I'm looking for). I just want it to return a value, say 'Yes' or 'No', based on some recursive code (for the non-deterministic part), using parent column values as controls (I know, this just screams TVF, but that's what I'm trying to get away from).

    I use a CROSS APPLY as it permits the use of parent columns in its declaration. The problem I am trying to overcome is being able to use a CTE inside that CROSS APPLY and have it use the parent columns in its definition. This is where it fails. The T-SQL compiler won't late-bind the parent column; it just throws an error as that parent column is not yet defined in a way the compiler can see it...

    I hope this is a little clearer.



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • ChrisM@home

    SSC-Insane

    Points: 24260

    Recursive CTE's aside (I've done a little work with them), I think you are still missing the point - they're a coding shortcut. If you cannot consider your CTE to be a macro, swappable into your "calling" query without raising an error (in the same way as a view), then it's not a CTE and it's not going to work in the way you might hope. This isn't a limitation, it's by design. CTE's don't magically create a result set which can then be requeried, and the same CTE referenced twice or more in the same "calling" query can realise different results from each reference.

    Your first cte is a rCTE, your second is ... well just a CTE. Why not try an inline function? What have you got against them?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Peter E. Kierstead

    SSCarpal Tunnel

    Points: 4114

    Why not try an inline function? What have you got against them?

    lol, we've come full circle. The original intent of my question was to stretch the boundaries of what was/is available with creative T-SQL/SQL code; and not just for the academic or pedantic fun of it. Its always about producing better performance.

    I have no "specific" function in mind. My original comment on this post was that of replacing UDFs used in row set operations with intrinsic functionality to produce faster query results. Some of these UDFs are complex enough that they present difficulties converting them to cascaded sub-queries and some of these "conversions" may benefit from the recursion that CTEs offer.



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Jeff Moden

    SSC Guru

    Points: 996655

    Peter E. Kierstead (2/9/2014)


    Why not try an inline function? What have you got against them?

    lol, we've come full circle. The original intent of my question was to stretch the boundaries of what was/is available with creative T-SQL/SQL code; and not just for the academic or pedantic fun of it. Its always about producing better performance.

    I have no "specific" function in mind. My original comment on this post was that of replacing UDFs used in row set operations with intrinsic functionality to produce faster query results. Some of these UDFs are complex enough that they present difficulties converting them to cascaded sub-queries and some of these "conversions" may benefit from the recursion that CTEs offer.

    If you're interested in performance, consider avoiding recursive CTE's that work one row at a time. See the following for why.

    http://www.sqlservercentral.com/articles/T-SQL/74118/

    You also talk about UDFs as if there were only one type. I agree that typical scalar and multi-statement TVFs (mTVF) are absolutely horrible for performance. Not true with iTVFs (inline Table Valued Functions) which can also be used as a kind of high performance iSF (inline Scalar Function). Please see the following for more on that.

    http://www.sqlservercentral.com/articles/T-SQL/91724/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Peter E. Kierstead

    SSCarpal Tunnel

    Points: 4114

    Hello Jeff.

    Thanks for the links. I am very familiar with this subject and have personally learned these lessons over the years, embarrassingly at times:w00t:

    You correctly point out that I seem to be talking about UDFs as if there were only one type. This is because when things are done correctly they don't show up on my bad performance meter and don't get talked about (well, not by me anyway)! I reference scalar UDFs because this is what I'm targeting 60-70% of the time when dealing with performance issues.

    As you've pointed out inline TVFs are the exception, but before you can use them you have to convert that poor performing scalar UDF code into something that can be used in an inline TVF, hence my focus on this conversion process (Cross Applys and stacked/cascading sub-queries). Let me take a sentence here to qualify things: I'm talking about scalar UDFs that are being used in row set operations, one-off calls in scripts are not my focus (although if any code must be written it should be written correctly and efficiently). Most scalar UDFs I've run across convert fairly easily, however, some provide a challenge. It is this class of problems I was attempting to discuss here.

    lol, I did notice in one of those links you sent me that you stumbled across the same measuring quandary I did a few years ago; the Set Statistics Time On/Off vs GetDate() for elapsed time measurements. Its almost like observing the system using Set Statistics Time On/Off causes a de-coherence of the SQL quantum wave function collapsing it in such a way as to give elapsed times from a different dimension than the one we're living in:alien:

    At the end of the day the 1M+ row elapsed-time run-off has the last word, many times with surprising results. So, keep preaching the I.B. SQL performance doctrine friend. I have nothing but respect for you, Chris, and any others taking the time to help us all by posting on this forum!

    So, in closing I leave you with this question: which is faster over 10,000,000 WHILE loop iterations? Select top 0 Null or Select Null ?

    [Code]

    Set NoCount On;

    Declare @sdt DateTime,@i Int,@v Int;

    -- Our control

    Set @i=10000000;

    Set @sdt=GetDate();

    While @i>0 Begin

    Set @i=@i-1;

    End

    Print DateDiff(ms,@sdt,GetDate());

    -- Test 1

    Set @i=10000000;

    Set @sdt=GetDate();

    While @i>0 Begin

    Select top 0 @v=Null;

    Set @i=@i-1;

    End

    Print DateDiff(ms,@sdt,GetDate());

    -- Test 2

    Set @i=10000000;

    Set @sdt=GetDate();

    While @i>0 Begin

    Select @v=Null;

    Set @i=@i-1;

    End

    Print DateDiff(ms,@sdt,GetDate());

    [/code]

    :w00t: D'Oh!



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • Phil Factor

    SSC-Insane

    Points: 20064

    @Peter E. Kierstead

    I would advise you to resist the temptation to blame Microsoft for being slow at shredding XML. I did the same once in the blog I referenced earlier in this discussion. A couple of readers of the blog patiently corrected my code. It's all in the syntax, but believe me, Microsoft's XML functions work at lightening speed if you ask the right questions in your syntax. That blog I referred to shows all.

    On the topic of delimited lists, If you know how many columns you have, and what datatype they are, there are some very fast tricks to shred them. It might be best to do it as a competition on ASK. Why not pose the question on ASK.SQLServerCentral.com as to which is the fastest way of doing it, give a good wadge of data as a sample, and then tell Peso, Jeff, Dwain and the others that you've posed the question. We've been searching for a good way of reviving the Speed Phreak competitions.

    Best wishes,
    Phil Factor
    Simple Talk

  • Solomon Rutzky

    SSCoach

    Points: 16258

    Sean Lange (2/6/2014)


    Alex Grinberg-230981 (2/6/2014)


    To send a table from .NET, you need to use DataTable. I wrote number of applications using this technique, now issues were found. If I have any issues, then I would include the problem in this article.

    +1

    I think that is one of the best usage examples for UDDTs around. I have implemented that myself many times and it saves lots of headaches.

    I am also a huge advocate of TVPs, but want to clarify the "need to use a DataTable" comment. That is not entirely true as there are two other methods for sending data from .Net to a TVP. One is using a DataReader, and is probably the least frequently used of the 3 options. The other option is to pass in a method that returns IEnumerable<SqlDataRecord> so that you can stream the data to SQL Server. I wrote an article here on SSC about this technique: Streaming Data Into SQL Server 2008 From an Application[/url].

    Depending on the size of your dataset and/or the context of the operation, it might be more efficient to use the IEnumerable method. For smaller data sets the DataTable method is just fine and is a little less work to implement. But larger data sets typically benefit from the IEnumerable / streaming method as they have a much smaller memory footprint. Loading all of your data into a DataTable means storing all of it in memory before calling the Stored Procedure. In some cases, this even doubles the memory footprint of the operation if the data is already in another collection and is only being stored in a DataTable for transport to the Stored Procedure. The IEnumerable method, on the other hand, only has one row in memory (on the application side) at a time.

    Take care,

    Solomon...

    P.S. I will say that as much as I love TVPs for sending in multiple rows of multiple columns, for simple one-dimensional arrays, I think that is generally better to just use a CSV list and splitter (SQLCLR or one of the proven-efficient T-SQL methods) rather than bother with a TVP.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 10 posts - 46 through 55 (of 55 total)

You must be logged in to reply to this topic. Login to reply