To TOP or not to TOP?

Question

Post reply

To TOP or not to TOP?

gemisigo

Ten Centuries

Points: 1244
More actions
May 17, 2012 at 4:02 am

#261157

I have a query that should return the fields of a table along with the field properties like type, nullability, default, etc. (and puts them into a string so that it can be used in a 'CREATE TABLE' statement but that's subsidiary). Here it is:
DECLARE @sql NVARCHAR(MAX) = ''
, @sema SYSNAME = 'dbo', @tabla SYSNAME = 'arbitrary_table'
SELECT
TOP 1000000000
@sql = @sql
+ CASE
WHEN c.column_id = 1 THEN ''
ELSE ','
END
+ c.[name] + ''
+ CASE
WHEN st.[name] LIKE '%char%' OR st.[name] LIKE '%binary%' THEN UPPER( st.[name] ) + '(' + CAST( c.max_length AS VARCHAR( 64 ) ) + ')'
WHEN st.[name] = 'DECIMAL' OR st.[name] = 'NUMBER' THEN UPPER( st.[name] ) + '(' + CAST( c.[precision] AS VARCHAR(64)) + ', ' + CAST(c.scale AS VARCHAR(64)) + ')'
ELSE UPPER( st.[name] )
END
+ CASE
WHEN c.is_identity = 1 THEN ' IDENTITY(1,1)'
ELSE ''
END
+ CASE c.is_nullable
WHEN 0 THEN ' NOT NULL'
WHEN 1 THEN ' NULL'
ELSE 'WTH???'
END
+ CASE WHEN dc.[name] IS NOT NULL THEN ' CONSTRAINT ' + dc.[name] + ' DEFAULT ' + dc.[definition] + ' ' ELSE ' -- no default ' END
+ CASE
WHEN RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1 THEN ''
ELSE CHAR( 10 )
END
FROM
sys.schemas s
INNER JOIN
sys.tables t
ON t.[schema_id] = s.[schema_id]
INNER JOIN
sys.[columns] c
ONc.[object_id] = t.[object_id]
INNER JOIN
sys.systypes st
ONst.xtype = c.system_type_id
LEFT OUTER JOIN sys.default_constraints AS dc
ON t.[object_id] = dc.parent_object_id
AND c.column_id = dc.parent_column_id
WHERE s.[name] = @sema AND t.[name] = @tabla AND st.[name] != 'sysname' AND st.uid = 4
ORDER BY c.column_id ASC
PRINT @sql
Though it might not cover all the special cases, it works fine. Until I comment the 'TOP 1000000000' line after which it returns only the last field. I'm obviously missing something important here but I can't figure what it is. Any ideas?
EDIT:
@michael-2 vessey: Actually, I removed an unnecessary line and the tail of it stayed there because of the word wrap. I've fixed that, thanks for the warning.

Viewing 15 posts - 1 through 15 (of 39 total)

You must be logged in to reply to this topic. Login to reply

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 1

there seems to be some syntax issues in your code

Msg 156, Level 15, State 1, Line 28

Incorrect syntax near the keyword 'DESC'.

i think it's a cut and paste issue

MVDBA

gemisigo Ten Centuries Points: 1244 More actions · Answer 2

gemisigo

Ten Centuries

Points: 1244

May 17, 2012 at 4:36 am

#1488896

Fixed, thanks for the warning.

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 3

if you take out the assertion and just return a recordset it works fine - i can't see why the top clause if affecting you when concatenating into a string

initially i was looking to see if there was some sort of NULL issue, but i tested it woth concat_null_yields_null OFF

🙁

MVDBA

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 4

i took the data from the query into a temp table and then did the

select @sql=@sql+mytext from #res

print @sql

works fine

MVDBA

gemisigo Ten Centuries Points: 1244 More actions · Answer 5

michael vessey (5/17/2012)
i took the data from the query into a temp table and then did the
select @sql=@sql+mytext from #res
print @sql
works fine

I'll implement that as a workaround until someone comes up with the cause of this issue, thanks!

srikant maurya Hall of Fame Points: 3293 More actions · Answer 6

Can you tell me what is use of statment

RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 7

i also tested it though a cursor and the cursor worked fine...

interestinigly - if you remove the order by clause then the query also works - very frustrating

i've been comparing the query plans , and the only difference (other than sort/Sort top N) is the version with the TOP clause in has an extra Compute scalar that seems to be typecasting data.. I suspected this is some sort of typecasting, but i explicitly cast everything as nvarchar(max) and it still get truncated

i then tried this using varchar(8000) and you still get the same problem - although i can't replicate it with other queries

i just noticed that if you remove the char(10) it also begins to work - i replaced it with '' and it works, but if i put 'a' instead of char(10) it fails again

finally came to the conclusion - the RANK OVER function is causing an issue in some way

if i change the PARTITION BY to c.columnid it works (albeit the wrong data), but c.object_id does not

is definatly in that section of code , but it all looks fine to me - maybe raise a case with MS

how did you come to the step of using TOP 1000000?

MVDBA

gemisigo Ten Centuries Points: 1244 More actions · Answer 8

srikant maurya (5/17/2012)
Can you tell me what is use of statment
RANK() OVER( PARTITION BY t.[name] ORDER BY c.column_id DESC ) = 1

Sure. This query is a part of a snippet that creates a script that reproduces an arbitrary table with all of its side-objects (constraints, indices, etc.) The reason for the statement you ask is merely cosmetics (it checks whether there should be a linefeed or not). Without it the result would be something like this:

CREATE TABLE arbitrary_table_name

.

,lastcolumnSMALLINT NOT NULL CONSTRAINT DF_arbitrary_table_name__lastcolumn DEFAULT ((25))

-- removing that check would result in this unwanted blank line here

)

GO

The statement is meaningless in the example here.

HowardW One Orange Chip Points: 29109 More actions · Answer 9

The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.

There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.

gemisigo Ten Centuries Points: 1244 More actions · Answer 10

michael vessey (5/17/2012)
how did you come to the step of using TOP 1000000?

Yes, I compared the exec plan as well, but I did not find anything meaningful. Tried a few things you did (except cursors) to no avail.

'TOP 1000000' was part of my anger management technique 🙂 I began with 'TOP 1' and then increased x in 'TOP x' to see which column is the suspect for this villainy. When x went over the max number of columns I removed it and stared dumbly at the one-line result once again. So I chose a random large number that was well over the column count to see if it makes any difference. It didn't 🙁

gemisigo Ten Centuries Points: 1244 More actions · Answer 11

HowardW (5/17/2012)
The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.
There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.

I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).

HowardW One Orange Chip Points: 29109 More actions · Answer 12

gemisigo (5/17/2012)
HowardW (5/17/2012)
The problem here is you're relying on an undocumented and unreliable use of variable assignment in a select statement.
There's absolutely no guarantee that the variable assignment will occur in the order of the order by clause or that it's all executed in a single thread as parallelism can be used. Basically, you can't rely on @variable=@variable + something else in a multi-row select statement, you're at the whim of the plan the optimiser happens to choose, which can change over time.
I am, but that does not explain this issue. The issue is not the order of the returned data, it's the lack of returned data. Omitting the TOP clause results in data loss (even when using MAXDOP 1 query hint, so it's not the parallelism either).

Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.

gemisigo Ten Centuries Points: 1244 More actions · Answer 13

HowardW (5/17/2012)
Fair enough, but it's not a bug, it's an unreasonable expectation that the internal query processor works in a way that you can use it for running procedural code within a select statement. It can segment out the delivery of rows in a select statement in various ways and there's no guarantee that it will persist the variable's value between these segments.

I haven't said it was a bug. And yes, it's undocumented and though it was rather "reliable" in the last don't even know how many cases, I know it was stated unreliable by some (by the way, has anyone proved that yet?), therefore I heed your warning and handle it with care. If you compare execution plans they are pretty much the same (yes, Segments as well) all the way up to Sequence Project node. Beyond that there are differences, of course (TOP has TopN Sort and there is a "surplus" Compute Scalar, etc.). Also, both plans yield the same number of records.

The only other difference is that the TOP version retains the contents of the variable while the TOPless keeps only the bottom 🙂 I can live with the fact it is unreliable here and it does not work the way it's supposed. My question is, why does TOP alter that behavior?

p.s. I don't think it's unreasonable to expect that the internal query processor (or anything else) works in a way that makes my task easier, the query faster and the returned dataset consistent. I believe that's the reason why these things were created in the first place. I accept, though, that personal opinions about how to do this or that might differ. But then again, it's just my personal opinion. Cheers 😉

MVDBA (Mike Vessey) SSC-Insane Points: 21797 More actions · Answer 14

interestingly enough i tried another workarount using a CTE.

it works fine every time - saves you having to use a temp table

🙂

as for whether it's a bug or not, well why not let microsoft know ??? it's an interesting and replicable behaviour that may have other far reaching implications we don't know of!!!

CTE CODE BELOW

GO

PRINT 'next'

DECLARE @sql VARCHAR(8000) = ''

, @sema SYSNAME = 'dbo', @tabla SYSNAME = 'defer';

with x(data,id) as (

SELECT

CASE

WHEN c.column_id = 1 THEN convert(varchar(max),'')

ELSE convert(varchar(max),',')

END

+ convert(varchar(max),c.[name]) + convert(varchar(max),'' )

+ CASE

WHEN st.[name] LIKE '%char%' OR st.[name] LIKE '%binary%' THEN convert(varchar(max),UPPER( st.[name] )) + convert(varchar(max),'(') + CAST( c.max_length AS VARCHAR( max ) ) + convert(varchar(max),')')

WHEN st.[name] = 'DECIMAL' OR st.[name] = 'NUMBER' THEN convert(varchar(max),UPPER( st.[name] ) )+ convert(varchar(max),'(') + CAST( c.[precision] AS VARCHAR(max)) +convert(varchar(max), ', ') + CAST(c.scale AS VARCHAR(max)) + convert(varchar(max),')')

ELSE convert(varchar(max),UPPER( st.[name] ))

END

+ CASE

WHEN c.is_identity = 1 THEN convert(varchar(max),' IDENTITY(1,1)')

ELSE convert(varchar(max),'')

END

+ CASE c.is_nullable

WHEN 0 THEN convert(varchar(max),' NOT NULL')

WHEN 1 THEN convert(varchar(max),' NULL')

ELSE convert(varchar(max),'WTH???')

END

+ CASE WHEN dc.[name] IS NOT NULL THEN convert(varchar(max),' CONSTRAINT ') +convert(varchar(max), dc.[name]) + convert(varchar(max),' DEFAULT ') + convert(varchar(max),dc.[definition]) + convert(varchar(max),' ') ELSE convert(varchar(max),' -- no default ') END

+ CASE

RANK() OVER( PARTITION BY t.name ORDER BY c.column_id DESC ) when 1 THEN ''

else CHAR(10)

END as data, RANK() OVER( PARTITION BY t.name ORDER BY c.column_id DESC )as id

FROM

sys.schemas s

INNER JOIN

sys.tables t

ON t.[schema_id] = s.[schema_id]

INNER JOIN

sys.[columns] c

ONc.[object_id] = t.[object_id]

INNER JOIN

sys.systypes st

ONst.xtype = c.system_type_id

LEFT OUTER JOIN sys.default_constraints AS dc

ON t.[object_id] = dc.parent_object_id

AND c.column_id = dc.parent_column_id

WHERE s.[name] = @sema AND t.[name] = @tabla AND st.[name] != 'sysname' AND st.uid = 4

)

select @sql=@sql+DATA from x ORDER BY x.id ASC;

PRINT @sql

GO

MVDBA