To TOP or not to TOP?

  • That's what I had in mind to try. You spared me some typing, Mike, thank you very much 😉

  • HowardW (5/17/2012)


    gemisigo (5/17/2012)


    HowardW (5/17/2012)


    The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.

    There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.

    I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).

    Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.

    I am not sure how you're jumping from the "it will not use predictably any ORDER BY" to "it's not documented at all". While it might not be specifically mentioned in BOL, it IS supported and actually has been given a name:

    http://support.microsoft.com/kb/287515/en-us

    To me it looks like a completely legitimate notation.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (5/17/2012)


    I am not sure how you're jumping from the "it will not use predictably any ORDER BY" to "it's not documented at all". While it might not be specifically mentioned in BOL, it IS supported and actually has been given a name:

    http://support.microsoft.com/kb/287515/en-us

    To me it looks like a completely legitimate notation.

    Hmm, hang on, that article was based on SQL 2000. It's the same argument with the "Quirky Update" running totals solution in that it relies on the internals of the execution plan to provide its results correctly - it's not documented and supported because there's a KB article for a decade old version of SQL.

    In any case, it's a moot point as it doesn't work in a variety of situations (as has been shown), it may break further in future releases and MS aren't likely to make changes to support it.

  • HowardW (5/17/2012)


    Hmm, hang on, that article was based on SQL 2000. It's the same argument with the "Quirky Update" running totals solution in that it relies on the internals of the execution plan to provide its results correctly - it's not documented and supported because there's a KB article for a decade old version of SQL.

    All right - I'll bite. Here's the matching one in 2012:

    http://msdn.microsoft.com/en-us/library/ms187330.aspx

    Not only does it use the same lingo, it introduces accumulating functions. Assuming the operation is transitive, the "last result" is the one that includes ALL rows, i.e has processed all rows, (whichever order the compiler decided to use). Again - how is it not documented?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (#4) (5/17/2012)


    HowardW (5/17/2012)


    Hmm, hang on, that article was based on SQL 2000. It's the same argument with the "Quirky Update" running totals solution in that it relies on the internals of the execution plan to provide its results correctly - it's not documented and supported because there's a KB article for a decade old version of SQL.

    All right - I'll bite. Here's the matching one in 2012:

    http://msdn.microsoft.com/en-us/library/ms187330.aspx

    Not only does it use the same lingo, it introduces accumulating functions. Assuming the operation is transitive, the "last result" is the one that includes ALL rows, i.e has processed all rows, (whichever order the compiler decided to use). Again - how is it not documented?

    I'll bite too...Although expression can be a column, it's defined as:

    Is a combination of symbols and operators that the SQL Server Database Engine evaluates to obtain a single data value.

  • I'll bite too...Although expression can be a column, it's defined as:

    Is a combination of symbols and operators that the SQL Server Database Engine evaluates to obtain a single data value.

    And a concatenated value is a single data value: that's the point of a concatenation. You're not getting a list back - it's a single value?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • And the rest of us will just sit back and watch you fencing 🙂 And drink to knowledge you spill 😉

  • Happy to agree to disagree. I don't think the article refers to accumulating functions being used in that way. The syntax exists as an alternative to SET, so you can apply a single assignment to a variable (be that accumulating or not).

    Yes, it doesn't physically break if you have multiple rows, but I don't see that being its intended use.

    Anyway, hometime here, have a good one 🙂

  • Though the intended use might not be that way, the article doesn't deny it either. Nevertheless, I hope you agree that this undocumented function is useful and that it should be both documented and improved if possible.

    Thank you both for the links and important info you shared here. I'm sure it will come in handy.

    HowardW, I wish you safe journey homewards 😉

  • And thank you gemisigo for allowing us to trample over your initial topic. Didn't mean to hijack it.

    Thanks Howard for the input.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • To clarify a few things here:

    The "string accumulation technique" is absolutely supported in SQL Server and likely always will be (the "Quirky Update" is a bit more of a grey area). The Microsoft SQL Server development team seriously dislikes it, but Microsoft has never withdrawn support for this feature, deprecated it nor even given any indication that they ever would. So it is supported.

    However there is NO *supported* way to control the ordering of that string's accumulation. This does not mean that including an ORDER BY or one of the other more devious tricks to control it's order is unsupported syntax. Generally they are NOT "unsupported syntax", which you can tell by the fact that you do not get compile errors if you try to use them.

    That said however, what Microsoft does not support is those efforts having any certain effect on the ordering of the string accumulation.

    So you can include the ORDER BY for instance, but SQL Server does not have to honor it for "variable assignment SELECTs" (and usually it does not in later versions).

    This is what leads to the employment of more elaborate tricks like the "TOP 100000000". This is supported and implemented for Variable-assignment SELECTs ("TOP x" that is), just like any other kind, and does both require and implement the ORDER BY in order to make the TOP cut.

    But note here, that just like with TOP x in Views, SQL only needs to do the ORDER BY to determine the TOP x. It does not need to keep the rows in that order for anything else, and if it could determine the TOP x without doing the ORDER BY (such as figuring out ahead of time that there are less than 10000000 rows anyway), then it would not have to do it at all. Of course what we know is that, the compiler is efficient and is unlikely to unsort the rows after sorting them unless it has to, and it does not figure out the (rowcount > TOP x) ahead of time. So far, anyway.

    So to summarize: the string accumulation trick is supported syntax and functionality. Attempts to order the accumulation are (generally) supported syntax, but not supported functionality.

    Clear? Quiz on Friday... 😀

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Yes, a few things got clear! I've learnt many things today 😎 I'm glad I asked 🙂

  • Hi Barry,

    That's interesting. Here though, its not just the ordering that's incorrect, the entire accumulated resultset gets discarded, so it just doesn't work.

    This does crop up every now and again when someone hits this problem and the general response from MVP's (not that that necessarily represents MS's stance) is that its not support and doesn't work reliably (my search skills seem to be failing me at present, but I'll try to find and post back).

    I guess whether it's supported syntax or not would come down to whether MS would fix it if this was raised as a bug - I'm guessing not as it means that the query processor must always act in serial.

  • Yes, it's not the ordering that fails here. Barry clarified quite a few things that was unknown for me but he wasn't saying he is addressing that part of the issue.

    The original question is still open: why/what does TOP change so that there's no data loss? I guess that (according to HowardW and without TOP) the query processor decides those are irrelevant (though they're not) and dumps them but (when using TOP) keeps them because TopN sort will need them (again, not the case 🙂 ). So please, discuss!

  • HowardW (5/18/2012)


    Hi Barry,

    That's interesting. Here though, its not just the ordering that's incorrect, the entire accumulated resultset gets discarded, so it just doesn't work.

    Right, but the "Why" of that is as yet undetermined. It may be a bug in T-Sql, but it seems to me that it still is more likely to be a NULL-handling problem in the logic that is more evident in some orderings than in others.

    This does crop up every now and again when someone hits this problem and the general response from MVP's (not that that necessarily represents MS's stance) is that its not support and doesn't work reliably (my search skills seem to be failing me at present, but I'll try to find and post back).

    Yes, the development team dislikes this feature, but cannot say that it is not supported, because it is actually supported and statements of support from a vendor carries legal implications with it. Instead they make many disparaging and equivocal statements about it that has led many MVPs to mistakenly believe and then state that it is not supported. The dev team is not going to correct these misstatements because the MVPs do not speak for Microsoft and so they are not legally required to correct their statements. Plus, now the MVPs are saying what they wish that they themselves could say.

    I guess whether it's supported syntax or not would come down to whether MS would fix it if this was raised as a bug - I'm guessing not as it means that the query processor must always act in serial.

    Being "supported" has a legal meaning and is not based on the post-release whims of the vendor or its developers. Rather it is based on the pre-purchase statements of the vendor (mostly the documentation) and those who represent it. And based on that it is supported.

    And the fact that the query processor "must always act in serial" would be an irrelevant and largely hypocritical excuse by Microsoft since both Cursors and FOR XML in T-SQL impose the same restrictions to a much larger degree and Microsoft seems to have no problem with the former and is actively promoting the latter. In fact everything in gemisigo's query could be done with either Cursors or FOR XML, just much less efficiently and with much more difficulty.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply