To TOP or not to TOP?

  • gemisigo (5/18/2012)


    Yes, it's not the ordering that fails here. Barry clarified quite a few things that was unknown for me but he wasn't saying he is addressing that part of the issue.

    Yes, that's correct, I was not addressing what's causing your issue.

    The original question is still open: why/what does TOP change so that there's no data loss? I guess that (according to HowardW and without TOP) the query processor decides those are irrelevant (though they're not) and dumps them but (when using TOP) keeps them because TopN sort will need them (again, not the case 🙂 ). So please, discuss!

    It seems to me that the most significant difference between with TOP and without is the ordering of the string accumulation.

    And offhand, the only way that I can think of that might cause the string to disappear in one ordering and not in another would be somekind of NULL mis-handling and/or accidental generation that in one ordering happens at the end and destroys the entire string, but in the other happens earlier and only destroys part of the string (and that somehow is not getting propogated forward to the rest of the accumulation).

    However, your expression is pretty complicated and I haven't yet gotten through checking it for this...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Surprisingly, I am able to (partially) reproduce this problem.

    When I run it against one of my tables with the TOP clause I get:

    smallidINT NOT NULL -- no default

    ,bigidINT NOT NULL -- no default

    ,dataCHAR(4) NULL -- no default

    but without the TOP clause I get:

    ,dataCHAR(4) NULL -- no default

    Hmm ...

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • OK, I believe that I have it, though the why of it is still a mystery...

    This line in your SELECT statement:

    + CASE WHEN dc.[name] IS NOT NULL THEN ' CONSTRAINT ' + dc.[name] + ' DEFAULT ' + dc.[definition] + ' ' ELSE ' -- no default ' END

    is causing the problem.

    In particular, any reference to 'dc.[definition]' seems to cause the problem, though strangely, ONLY when the last column is ordered first (causing the following CASE expression to not append the CHAR(10)). Because of how the RANK function in that following CASE expression uses DESC ordering on column_id, this will naturally happen anytime the "TOP x / ORDER BY column_id" is not in effect to override it.

    As to why "dc.[definition]" is having this effect? No idea at all. But any attempt at all to include it in the cumulative string there seems to lead to this effect. I tried using COALESCE(), ISNULL(), CAST(), LEFT(), even an obtuse LEN() on it to no avail. The only thing I could find that would fix the problem was to remove the reference to it altogether.

    ... ???

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • The query plan without the TOP happens to feature a Sort operator (node 0) after the Compute Scalar (node 1) that defines the expression of interest. In the execution plan I have, this expression is labelled [Expr1127] (the number may differ for you). This definition contains a very long argument that starts with "Scalar Operator(((((((([@sql]+...<stuff>".

    Scalar expression evaluation is generally delayed as long as possible in the execution engine. This is why I was careful to say that the Compute Scalar defines the expression rather than saying it evaluated it. Evaluation is delayed until another operator in the execution plan needs the value, or the value needs to be returned to the client.

    The Sort operator in the plan references the expression [Expr1127] causing it to be evaluated. Unfortunately (for you) evaluating the expression at that point produces a value, but does not assign it back to the @sql variable between calls (variable assignment happens at the root of the plan).

    Since Sort is a blocking operation, all input rows are processed before it can start producing output. The value of [Expr1127] is evaluated once per row, and stored in the Sort's input buffers (but @sql is not updated, remember). The Sort emits rows (including the persisted value of [Expr1127] per row) in column_id order. The root of the plan (represented by the green SELECT icon) assigns the persisted value of [Expr1127] to @sql, once per row. Because it is using the persisted result of [Expr1127] rather than evaluating the expression (which contains the self-reference) at assignment time, @sql ends up holding just the last value of [Expr1127].

    Where TOP is specified, the query plan happens to feature the [Expr1127] definition at node 0, and a Sort running in TopN mode at node 2 (there is another scalar definition at Node 1, but that's a just a technicality, and not important). Since [Expr1127] is defined after the blocking Sort operator, there is no iterator that requires the value of [Expr1127], and the rolling-assignment trick works as you would hope: assignment at the root references the expression (not the result of the expression) containing the variable self-reference.

    And 'hope' is the operative word there. Barry's opinions notwithstanding, my own view is that this construct is to be avoided, as it is unreliable (as you have seen). The behaviour was only maintained for backward compatibility reasons (and that post is dated seven years ago). Backward compatibility means reasonable effort is made to ensure existing code won't break, not that new code (perhaps using new features) will work correctly. Backward compatibility tends to wear off with time (and seven years is a long time). Development and testing time is limited, and unlikely to be spent ensuring wacky things like 'aggregate concatenation' work in all possible new scenarios.

    You might think you can craft queries to ensure the crucial expression is not evaluated before assignment takes place at the root of the plan, perhaps using plan guides. This isn't reliable either: USE PLAN does not force the position of Compute Scalars (the Expression Service is a quite separate component).

    Reliable alternatives to 'aggregate concatenation' include FOR XML PATH (though it is a dreadful hack really) or a SQLCLR UDA (user-defined aggregate). The XML PATH solution can provide guaranteed ordering; the SQLCLR UDA cannot. The proper solution will be to use Ordered Set Functions (Connect suggestion by Itzik Ben-Gan), but these are not yet implemented in SQL Server (outside the internal-only STATMAN aggregate used in statistics gathering).

    (As an aside, I can see nothing in the link Matt Miller provided that talks about aggregate concatenation.)

  • RBarryYoung (5/18/2012)


    As to why "dc.[definition]" is having this effect? No idea at all.

    It is a MAX column. These are often handled differently internally (via pointers rather than copying) because it could be 2GB in size. Different considerations, different code path, and the order of the Compute Scalar and Sort operators is affected. With definition in, the Sort appears after the Compute Scalar, and the concatenation trick breaks. Without definition, the Compute Scalar appears after the Sort, nothing refers to [Expr1127] early, and the trick happens to work. None of this is predictable in advance, or enforceable.

    By the way, the following does not work in SQL Server 2008 for the same reason. The same code works 'correctly' in 2005, 2008 R2, and 2012 (all are latest public builds at the time of writing).

    DECLARE @t AS TABLE

    (

    name varchar(max) NOT NULL

    );

    INSERT @t (name)

    SELECT 'Aaron'

    UNION ALL

    SELECT 'Zack';

    DECLARE @text varchar(max);

    SET @text = SPACE(0);

    SELECT

    @text = @text + t.name + '; '

    FROM @t AS t

    ORDER BY

    t.name;

    SELECT @text;

  • RBarryYoung (5/18/2012)


    In particular, any reference to 'dc.[definition]' seems to cause the problem, though strangely, ONLY when the last column is ordered first (causing the following CASE expression to not append the CHAR(10)). Because of how the RANK function in that following CASE expression uses DESC ordering on column_id, this will naturally happen anytime the "TOP x / ORDER BY column_id" is not in effect to override it.

    Just in case it isn't obvious, without the DESC ordering on column_id in the RANK, the optimizer can find a plan that does not require any sorts at all, so the specific difficulty I discussed does not arise (though there is no guarantee of that; it's not only sorts than can break this).

  • The 2012 functionality within the += operator IS in fact a concatenator (if you use it with string data types). As long as you don't particularly case in which order it appends, it seems to do it just fine.

    If you look at operators they built as accumulators in general they are associative and commutative (so processing order across sets doesn't affect the final result).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for the thorough explanation, Paul. Finally, we've got an answer for both 'where' and 'why'. As this is only used in a development tool and it's quite salient when it breaks I'll keep using it until I figure out how to do it in a reliable way (preferably one that does not include cursors). But I'll have to do some studies concerning XML PATH and SQLCLR UDA both being unknown territory for me. At least I have a homework for now 😉

    Thanks again for sharing your knowledge, it'll be put to good use.

  • Matt Miller (#4) (5/18/2012)


    The 2012 functionality within the += operator IS in fact a concatenator (if you use it with string data types). As long as you don't particularly case in which order it appends, it seems to do it just fine. If you look at operators they built as accumulators in general they are associative and commutative (so processing order across sets doesn't affect the final result).

    The += operator is simple shorthand as stated in http://msdn.microsoft.com/en-us/library/dd206992.aspx. Both forms are parsed and processed exactly the same:

    DECLARE

    @sql nvarchar(max) = N'';

    SELECT @sql += p.Name

    FROM Production.Product AS p

    OPTION (RECOMPILE, QUERYTRACEON 3604, QUERYTRACEON 8605, QUERYTRACEON 8607);

    SELECT @sql = @sql + p.Name

    FROM Production.Product AS p

    OPTION (RECOMPILE, QUERYTRACEON 3604, QUERYTRACEON 8605, QUERYTRACEON 8607);

    *** Converted Tree: ***

    LogOp_Project COL: Expr1002

    LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=1461580245 TableReferenceID=0 IsRow: COL: IsBaseRow1001

    AncOp_PrjList

    AncOp_PrjEl COL: Expr1002

    ScaOp_Arithmetic x_aopAdd

    ScaOp_Identifier COL: @sql

    ScaOp_Identifier QCOL: [p].Name

    *** Output Tree: (trivial plan) ***

    PhyOp_ComputeScalar

    PhyOp_NOP

    PhyOp_Range TBL: Production.Product(alias TBL: p)(3) ASC Bmk ( QCOL: [p].ProductID) IsRow: COL: IsBaseRow1001

    AncOp_PrjList

    AncOp_PrjEl COL: Expr1002

    ScaOp_Arithmetic x_aopAdd

    ScaOp_Identifier COL: @sql

    ScaOp_Identifier QCOL: [p].Name

    *** Converted Tree: ***

    LogOp_Project COL: Expr1002

    LogOp_Get TBL: Production.Product(alias TBL: p) Production.Product TableID=1461580245 TableReferenceID=0 IsRow: COL: IsBaseRow1001

    AncOp_PrjList

    AncOp_PrjEl COL: Expr1002

    ScaOp_Arithmetic x_aopAdd

    ScaOp_Identifier COL: @sql

    ScaOp_Identifier QCOL: [p].Name

    *** Output Tree: (trivial plan) ***

    PhyOp_ComputeScalar

    PhyOp_NOP

    PhyOp_Range TBL: Production.Product(alias TBL: p)(3) ASC Bmk ( QCOL: [p].ProductID) IsRow: COL: IsBaseRow1001

    AncOp_PrjList

    AncOp_PrjEl COL: Expr1002

    ScaOp_Arithmetic x_aopAdd

    ScaOp_Identifier COL: @sql

    ScaOp_Identifier QCOL: [p].Name

  • gemisigo (5/18/2012)


    Thanks for the thorough explanation, Paul. Finally, we've got an answer for both 'where' and 'why'. As this is only used in a development tool and it's quite salient when it breaks I'll keep using it until I figure out how to do it in a reliable way (preferably one that does not include cursors). But I'll have to do some studies concerning XML PATH and SQLCLR UDA both being unknown territory for me. At least I have a homework for now 😉

    Thanks again for sharing your knowledge, it'll be put to good use.

    You're welcome. This is a *quick* edit of your code to use FOR XML PATH:

    USE AdventureWorks;

    DECLARE

    @sql nvarchar(max) = '',

    @sema sysname = 'Production',

    @tabla sysname = 'Product';

    SET @sql =

    (

    SELECT

    NCHAR(9) +

    CASE

    WHEN c.column_id = 1 THEN SPACE(0)

    ELSE N','

    END +

    c.[name] + NCHAR(9) +

    CASE

    WHEN st.[name] LIKE N'%char%' OR st.[name] LIKE N'%binary%'

    THEN UPPER(st.[name]) + N'(' + CAST(c.max_length AS nvarchar(64)) + N')'

    WHEN st.[name] = N'DECIMAL' OR st.[name] = 'NUMERIC'

    THEN UPPER(st.[name]) + '(' + CAST(c.[precision] AS nvarchar(64)) + N', ' +

    CAST(c.scale AS nvarchar(64)) + N')'

    ELSE UPPER(st.[name])

    END +

    CASE

    WHEN c.is_identity = 1 THEN N' IDENTITY(1,1)'

    ELSE SPACE(0)

    END +

    CASE

    WHEN c.is_nullable = 0 THEN N' NOT NULL'

    WHEN c.is_nullable = 1 THEN N' NULL'

    ELSE N'***'

    END +

    CASE

    WHEN dc.[name] IS NOT NULL

    THEN N' CONSTRAINT ' + dc.[name] + N' DEFAULT '+ dc.[definition] + SPACE(1)

    ELSE N' -- no default '

    END

    + CASE

    WHEN 1 =

    RANK() OVER (

    PARTITION BY t.[name]

    ORDER BY c.column_id DESC)

    THEN SPACE(0)

    ELSE NCHAR(10)

    END

    FROM sys.schemas AS s

    JOIN sys.tables AS t

    ON t.[schema_id] = s.[schema_id]

    JOIN sys.[columns] AS c

    ON c.[object_id] = t.[object_id]

    JOIN sys.systypes AS st

    ON st.xtype = c.system_type_id

    LEFT JOIN sys.default_constraints AS dc

    ON t.[object_id] = dc.parent_object_id

    AND c.column_id = dc.parent_column_id

    WHERE

    s.[name] = @sema

    AND t.[name] = @tabla

    AND st.[name] != N'sysname'

    AND st.uid = 4

    ORDER BY

    c.column_id ASC

    FOR XML PATH (''), TYPE

    ).value('(./text()[1])', 'nvarchar(max)');

    PRINT @sql;

Viewing 10 posts - 31 through 39 (of 39 total)

You must be logged in to reply to this topic. Login to reply