To TOP or not to TOP?

  • I have a query that should return the fields of a table along with the field properties like type, nullability, default, etc. (and puts them into a string so that it can be used in a 'CREATE TABLE' statement but that's subsidiary). Here it is:

    DECLARE @sql NVARCHAR(MAX) = ''

    , @sema SYSNAME = 'dbo', @tabla SYSNAME = 'arbitrary_table'

    SELECT

    TOP 1000000000

    @sql = @sql

    + CASE

    WHEN c.column_id = 1 THEN ''

    ELSE ','

    END

    + c.[name] + ''

    + CASE

    WHEN st.[name] LIKE '%char%' OR st.[name] LIKE '%binary%' THEN UPPER( st.[name] ) + '(' + CAST( c.max_length AS VARCHAR( 64 ) ) + ')'

    WHEN st.[name] = 'DECIMAL' OR st.[name] = 'NUMBER' THEN UPPER( st.[name] ) + '(' + CAST( c.[precision] AS VARCHAR(64)) + ', ' + CAST(c.scale AS VARCHAR(64)) + ')'

    ELSE UPPER( st.[name] )

    END

    + CASE

    WHEN c.is_identity = 1 THEN ' IDENTITY(1,1)'

    ELSE ''

    END

    + CASE c.is_nullable

    WHEN 0 THEN ' NOT NULL'

    WHEN 1 THEN ' NULL'

    ELSE 'WTH???'

    END

    + CASE WHEN dc.[name] IS NOT NULL THEN ' CONSTRAINT ' + dc.[name] + ' DEFAULT ' + dc.[definition] + ' ' ELSE ' -- no default ' END

    + CASE

    WHEN RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1 THEN ''

    ELSE CHAR( 10 )

    END

    FROM

    sys.schemas s

    INNER JOIN

    sys.tables t

    ON t.[schema_id] = s.[schema_id]

    INNER JOIN

    sys.[columns] c

    ONc.[object_id] = t.[object_id]

    INNER JOIN

    sys.systypes st

    ONst.xtype = c.system_type_id

    LEFT OUTER JOIN sys.default_constraints AS dc

    ON t.[object_id] = dc.parent_object_id

    AND c.column_id = dc.parent_column_id

    WHERE s.[name] = @sema AND t.[name] = @tabla AND st.[name] != 'sysname' AND st.uid = 4

    ORDER BY c.column_id ASC

    PRINT @sql

    Though it might not cover all the special cases, it works fine. Until I comment the 'TOP 1000000000' line after which it returns only the last field. I'm obviously missing something important here but I can't figure what it is. Any ideas?

    EDIT:

    @michael-2 vessey: Actually, I removed an unnecessary line and the tail of it stayed there because of the word wrap. I've fixed that, thanks for the warning.

  • there seems to be some syntax issues in your code

    Msg 156, Level 15, State 1, Line 28

    Incorrect syntax near the keyword 'DESC'.

    i think it's a cut and paste issue

    MVDBA

  • Fixed, thanks for the warning.

  • if you take out the assertion and just return a recordset it works fine - i can't see why the top clause if affecting you when concatenating into a string

    initially i was looking to see if there was some sort of NULL issue, but i tested it woth concat_null_yields_null OFF

    🙁

    MVDBA

  • i took the data from the query into a temp table and then did the

    select @sql=@sql+mytext from #res

    print @sql

    works fine

    MVDBA

  • michael vessey (5/17/2012)


    i took the data from the query into a temp table and then did the

    select @sql=@sql+mytext from #res

    print @sql

    works fine

    I'll implement that as a workaround until someone comes up with the cause of this issue, thanks!

  • Can you tell me what is use of statment

    RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1

  • i also tested it though a cursor and the cursor worked fine...

    interestinigly - if you remove the order by clause then the query also works - very frustrating

    i've been comparing the query plans , and the only difference (other than sort/Sort top N) is the version with the TOP clause in has an extra Compute scalar that seems to be typecasting data.. I suspected this is some sort of typecasting, but i explicitly cast everything as nvarchar(max) and it still get truncated

    i then tried this using varchar(8000) and you still get the same problem - although i can't replicate it with other queries

    i just noticed that if you remove the char(10) it also begins to work - i replaced it with '' and it works, but if i put 'a' instead of char(10) it fails again

    finally came to the conclusion - the RANK OVER function is causing an issue in some way

    if i change the PARTITION BY to c.columnid it works (albeit the wrong data), but c.object_id does not

    is definatly in that section of code , but it all looks fine to me - maybe raise a case with MS

    how did you come to the step of using TOP 1000000?

    MVDBA

  • srikant maurya (5/17/2012)


    Can you tell me what is use of statment

    RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1

    Sure. This query is a part of a snippet that creates a script that reproduces an arbitrary table with all of its side-objects (constraints, indices, etc.) The reason for the statement you ask is merely cosmetics (it checks whether there should be a linefeed or not). Without it the result would be something like this:

    CREATE TABLE arbitrary_table_name

    .

    .

    .

    ,lastcolumnSMALLINT NOT NULL CONSTRAINT DF_arbitrary_table_name__lastcolumn DEFAULT ((25))

    -- removing that check would result in this unwanted blank line here

    )

    GO

    The statement is meaningless in the example here.

  • The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.

    There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.

  • michael vessey (5/17/2012)


    how did you come to the step of using TOP 1000000?

    Yes, I compared the exec plan as well, but I did not find anything meaningful. Tried a few things you did (except cursors) to no avail.

    'TOP 1000000' was part of my anger management technique 🙂 I began with 'TOP 1' and then increased x in 'TOP x' to see which column is the suspect for this villainy. When x went over the max number of columns I removed it and stared dumbly at the one-line result once again. So I chose a random large number that was well over the column count to see if it makes any difference. It didn't 🙁

  • HowardW (5/17/2012)


    The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.

    There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.

    I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).

  • gemisigo (5/17/2012)


    HowardW (5/17/2012)


    The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.

    There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.

    I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).

    Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.

  • HowardW (5/17/2012)


    Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.

    I haven't said it was a bug. And yes, it's undocumented and though it was rather "reliable" in the last don't even know how many cases, I know it was stated unreliable by some (by the way, has anyone proved that yet?), therefore I heed your warning and handle it with care. If you compare execution plans they are pretty much the same (yes, Segments as well) all the way up to Sequence Project node. Beyond that there are differences, of course (TOP has TopN Sort and there is a "surplus" Compute Scalar, etc.). Also, both plans yield the same number of records.

    The only other difference is that the TOP version retains the contents of the variable while the TOPless keeps only the bottom 🙂 I can live with the fact it is unreliable here and it does not work the way it's supposed. My question is, why does TOP alter that behavior?

    p.s. I don't think it's unreasonable to expect that the internal query processor (or anything else) works in a way that makes my task easier, the query faster and the returned dataset consistent. I believe that's the reason why these things were created in the first place. I accept, though, that personal opinions about how to do this or that might differ. But then again, it's just my personal opinion. Cheers 😉

  • interestingly enough i tried another workarount using a CTE.

    it works fine every time - saves you having to use a temp table

    🙂

    as for whether it's a bug or not, well why not let microsoft know ??? it's an interesting and replicable behaviour that may have other far reaching implications we don't know of!!!

    CTE CODE BELOW

    GO

    PRINT 'next'

    DECLARE @sql VARCHAR(8000) = ''

    , @sema SYSNAME = 'dbo', @tabla SYSNAME = 'defer';

    with x(data,id) as (

    SELECT

    CASE

    WHEN c.column_id = 1 THEN convert(varchar(max),'')

    ELSE convert(varchar(max),',')

    END

    + convert(varchar(max),c.[name]) + convert(varchar(max),'' )

    + CASE

    WHEN st.[name] LIKE '%char%' OR st.[name] LIKE '%binary%' THEN convert(varchar(max),UPPER( st.[name] )) + convert(varchar(max),'(') + CAST( c.max_length AS VARCHAR( max ) ) + convert(varchar(max),')')

    WHEN st.[name] = 'DECIMAL' OR st.[name] = 'NUMBER' THEN convert(varchar(max),UPPER( st.[name] ) )+ convert(varchar(max),'(') + CAST( c.[precision] AS VARCHAR(max)) +convert(varchar(max), ', ') + CAST(c.scale AS VARCHAR(max)) + convert(varchar(max),')')

    ELSE convert(varchar(max),UPPER( st.[name] ))

    END

    + CASE

    WHEN c.is_identity = 1 THEN convert(varchar(max),' IDENTITY(1,1)')

    ELSE convert(varchar(max),'')

    END

    + CASE c.is_nullable

    WHEN 0 THEN convert(varchar(max),' NOT NULL')

    WHEN 1 THEN convert(varchar(max),' NULL')

    ELSE convert(varchar(max),'WTH???')

    END

    + CASE WHEN dc.[name] IS NOT NULL THEN convert(varchar(max),' CONSTRAINT ') +convert(varchar(max), dc.[name]) + convert(varchar(max),' DEFAULT ') + convert(varchar(max),dc.[definition]) + convert(varchar(max),' ') ELSE convert(varchar(max),' -- no default ') END

    + CASE

    RANK() OVER( PARTITION BY t.name ORDER BY c.column_id DESC ) when 1 THEN ''

    else CHAR(10)

    END as data, RANK() OVER( PARTITION BY t.name ORDER BY c.column_id DESC )as id

    FROM

    sys.schemas s

    INNER JOIN

    sys.tables t

    ON t.[schema_id] = s.[schema_id]

    INNER JOIN

    sys.[columns] c

    ONc.[object_id] = t.[object_id]

    INNER JOIN

    sys.systypes st

    ONst.xtype = c.system_type_id

    LEFT OUTER JOIN sys.default_constraints AS dc

    ON t.[object_id] = dc.parent_object_id

    AND c.column_id = dc.parent_column_id

    WHERE s.[name] = @sema AND t.[name] = @tabla AND st.[name] != 'sysname' AND st.uid = 4

    )

    select @sql=@sql+DATA from x ORDER BY x.id ASC;

    PRINT @sql

    GO

    MVDBA

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply