Very strange question ... with an unexplainable solution

  • Hi all,

    Got a bit of an odd one here. I have a problem, which I have several solutions for. I just have no idea why the problem is there, and why any of the solutions work 😛

    Here's the sample data:

    IF OBJECT_ID('Test') IS NOT NULL

    DROP TABLE [Test]

    CREATE TABLE [dbo].[Test](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [TableName] [nvarchar](128) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

    [ColumnName] [nvarchar](128) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ColumnValue] [nvarchar](MAX) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ColumnOrder] [tinyint] NOT NULL,

    CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    INSERT INTO [Test] (TableName, ColumnName, ColumnValue, ColumnOrder)

    VALUES ('Test', 'Test1', NULL, 1)

    INSERT INTO [Test] (TableName, ColumnName, ColumnValue, ColumnOrder)

    VALUES ('Test', 'Test2', NULL, 3)

    INSERT INTO [Test] (TableName, ColumnName, ColumnValue, ColumnOrder)

    VALUES ('Test', NULL, 'M', 4)

    INSERT INTO [Test] (TableName, ColumnName, ColumnValue, ColumnOrder)

    VALUES ('Test', 'Test3', NULL, 5)

    INSERT INTO [Test] (TableName, ColumnName, ColumnValue, ColumnOrder)

    VALUES ('Test', NULL, NULL, 6)

    INSERT INTO [Test] (TableName, ColumnName, ColumnValue, ColumnOrder)

    VALUES ('Test', NULL, 'ABC', 2)

    And here's the query in question:

    DECLARE @sqlquery NVARCHAR(MAX)

    SET @sqlquery = 'SELECT ' + CHAR(10)

    SELECT

    @sqlquery = @sqlquery + (

    CASE

    WHEN ColumnName IS NOT NULL THEN ColumnName

    WHEN ColumnValue IS NOT NULL THEN ColumnValue

    ELSE ''

    END

    )

    FROM [Test]

    WHERE TableName = 'Test'

    ORDER BY ColumnOrder

    PRINT @sqlquery

    Running the query, you'll notice that the output produced is simply "SELECT". It should, however, be "SELECT Test1Test2MTest3ABC".

    I've found at least three solutions to the problem. If I change [ColumnValue] from NVARCHAR(MAX) to, lets say, NVARCHAR(80), it produces the correct output. If I remove the "WHERE TableName = 'Test'" clause, it produces the correct output. And, finally, if I remove the "ORDER BY ColumnOrder", it also produces the correct output.

    So, my question to all of you is ... 1) Why is it not working to start off with, and, perhaps more importantly, 2) Why do those solutions work???

  • Two more solutions, again, neither of which I understand. If, in the CASE statement, I change the line: "WHEN ColumnValue IS NOT NULL THEN ColumnValue" to, lets say, "WHEN ColumnValue IS NOT NULL THEN CAST(ColumnValue AS NVARCHAR(80))", then it works. Also, if I add a TOP value, like, say, "SELECT TOP 6", then it also works. But, interestingly, if I make it "SELECT TOP (100) PERCENT", then it doesn't work!

  • I think when you say "SELECT Test1Test2MTest3ABC" you mean "SELECT Test1ABCTest2MTest3"; or maybe the query is wrong and should have ORDER BY in instead of ORDER BY ColumnOrder?

    Anyway, according to BoL the result of such a query is not defined. The attitude appears to be that ORDER BY doesn't do anything unless it is affecting the order of rows in a top-level result set or a ranking or row numbering operation or required by a TOP filter, in all other cases the optimiser can ignore it. Some time ago there was a lot of discussion of this technique at SQLServerCentral, and conditions under which it would probably work as desired were worked out. For example any query using it should probably specify MAXDOP=1 since parallelism could destroy it. A lot of people remained concerned that it was very dangerous to use a feature whose effect was technically "undefined" so eventually a means of checking automatically that it was actually working as expected was devised. If you are interested in this technique, google for "quirky update" site:sqlservercentral.com and you will find some interesting articles and threads (but they will be about the form os this using UPDATE rather than the simpler SELECT form) - in particular, Paul White explained some of the circumstances in which it won't work.

    Anyway, in this query the optimiser can see that whole result is a single value so the sort can't affect the order of rows in that resul, so it doesn't do the sort; so the result is undefined.

    I'm not sure quite what happens with SQL 2005, as I don't have one to play with, but if it's anything like SQL 2008, SQL 2008 R2, or SQL 2000 I suspect that the result of not doing the sort and getting all rows by doing a clustered index scan on the primary key (the id column) gets the string you quoted (the result of sorting on id instead of Column order) when you change the variable from varchar(MAX) to varchar(80).

    As a general rule (for the select case) specifying a meaningful TOP clause (TOP 100 percent is meaningless, it does nothing) will cause optimiser to include the sort: "TOP 99.9999999 PERCENT" will work fine to force the order clause to be obeyed and won't lose you any rows unless you have an awful lot. Equally good is "TOP 1000000000" - it if you don't have more than that many rows. Also (for select, anyway) this variable update appears to work without any top constraint if the order by columns are the columns of a UNIQUE index and are not nullable, even if the target variable is declared as varchar(MAX). (I don't actually know whether nullability matters - but I haven't tried it with nullable columns, so I'm playing safe.) But whether this is solid enough to rely on I'm not sure.

    Tom

  • Ahh right, thanks Joe. Had read about that in Jeff's article on the "quirky" update, and didn't realise it applied here too.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply