Warning... Order by FILTERS results

  • Check this out :

    I was tweaking a working reorganize script for my new client and I wanted to randomize the reorganize sequence because the data distribution is really uneven in the DB. The real intent was to be able to run the code in high production hours if need be. And since I can't use the online option with 2005 Standard, I had to avoid reorganizing 8 big indexes in a row on the same large table... hence locking it for 5 minutes.

    I test the concatenation script and everything is fine, I then add the ORDER BY NEWID() and call it a night after I schedule the job to run at 10 PM.

    The next day I see that the job ran in less than 3 minutes... which is completely impossible with 95% fragmentation on the DB across all indexes (autoshrink was ON when I got here). I had to run this overnight because I had 3 complaints from users when I tried to run the code during high prod hours.

    After a couple minutes of digging I find this out :

    GO

    DECLARE @X NVARCHAR(MAX)

    SET @X = ''

    SELECT @X = @X + 'PRINT ''[' + name + ']'';' + CHAR(13) + CHAR(10) FROM msdb.sys.tables ORDER BY name

    --PRINT @X

    EXEC (@X)

    GO --95 rows affected

    --This is what I used when I ran in the job :

    DECLARE @X NVARCHAR(MAX)

    SET @X = ''

    SELECT @X = @X + 'PRINT ''[' + name + ']'';' + CHAR(13) + CHAR(10) FROM msdb.sys.tables ORDER BY NEWID()

    --PRINT @X

    EXEC (@X)

    GO --1 rows affected

    --This is what had happened in my job... only 1 index had been reorganized... hence the ridiculously fast reorganize.

    DECLARE @X NVARCHAR(MAX)

    SET @X = ''

    SELECT @X = @X + 'PRINT ''[' + name + ']'';' + CHAR(13) + CHAR(10) FROM msdb.sys.tables ORDER BY (SELECT NEWID())

    --PRINT @X

    EXEC (@X)

    GO --95 rows affected

    --After moving the NEWID() is its own subquery, the query works again as it should... ugly hack if you ask me, but at least it works!

    Did I just stumble on a bug, or did I misunderstand the use of orderby and newid() all those years?

    Where can I report this to MS if this is a bug?

    Was this fixed in a hotfix or post SP 2 that is not installed on the server (can't seem to find that bug anywhere in the build list)?

    Server info : Microsoft SQL Server 2005 - 9.00.3042.00 (Intel X86) Feb 9 2007 22:47:07 Copyright (c) 1988-2005 Microsoft Corporation Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

  • I went ahead and posted this as a bug on the connect website.

    Here's the url if you have anything else to add to this for Microsoft to use to fix the issue :

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=383641

  • I get the same issue here seems very odd - 9.0.4028.

    Anyway it looks like you can get all the tables if you add a top X:

    SELECT TOP 99.9 PERCENT @X = @X + 'PRINT ''[' + name + ']'';' + CHAR(13) + CHAR(10) FROM msdb.sys.tables ORDER BY NEWID()

    If you say 99.9 percent then it give you all the records, if you say 100 percent it seems to ignore it and just give you the 1 record!

    ed

  • Thanks edward.

    That opens up another valid work around using TOP 999999999999 (make sure this number is way over the actual # of records possibly returned).

    SELECT TOP (10000000) @X = @X + 'PRINT ''[' + name + ']'';' + CHAR(13) + CHAR(10) FROM msdb.sys.tables ORDER BY NEWID()

    I'll post those as possible workarounds... that'll surely guide the MS team on the right tracks.

    Keep 'em comming.

  • The SQL Server is just tried your query on is as follows:

    Microsoft SQL Server 2005 - 9.00.1399.06 (Intel X86)

    Oct 14 2005 00:33:37

    Copyright (c) 1988-2005 Microsoft Corporation

    Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

    Here's another way around, as using a CTE instead of a temp table failed to produce the desired results in the same way the original query does, which is really interesting when you think about it.

    DECLARE @X NVARCHAR(MAX)

    SET @X = ''

    SELECT [name], NEWID() AS NEW_ID

    INTO #TEST

    FROM msdb.sys.tables

    SELECT @X = @X + 'PRINT ''[' + [name] + ']'';' + CHAR(13) + CHAR(10)

    FROM #TEST

    ORDER BY NEW_ID

    --PRINT @X

    EXEC (@X)

    GO

    DROP TABLE #TEST

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • That's so cool, so many ways to screw up with the engine ;).

  • I just tested a variation, where I moved the ORDER BY clause up into the SELECT INTO query, and it continues to produce a correct result, but apparently, only because SQL Server is forced to instantiate the data via the temp table. Otherwise, it appears the optimizer shortcuts the process. Take a look at the execution plans, as I don't have the ability to do that in my current environment.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • The working plans have an extra compute scalar operation... which seems to be where the data is shot in the variable.

    In the failing plan, there's a difference in the compute scalar. Instead of fetching the table names, it fetches 2 columns (probably where it sets the data in the variable)... then it forgets to concatenate anything!

  • One more interesting detail... I tried this on SQL 2008, and it works as you described, but with a twist. Each time the procedure runs, it produces a single row result, but the row produced is different each time. While the result appears to be random, after running this many times, it seems that the row selected is actually one of a small set of possible results (though that may be that I didn't run it enough to get the full effect of randomness).

    Too weird.

    😉

  • I'm not sure the OP noticed, but I figured it didn't matter much, as that's the exact same kind of result I was getting. Some one of those records got chosen, and sometimes it was the same one a couple times in a row, but it was often different. I was using SQL 2005.

    Steve

    (aka smunson)

    :):):)

    aureolin (11/25/2008)


    One more interesting detail... I tried this on SQL 2008, and it works as you described, but with a twist. Each time the procedure runs, it produces a single row result, but the row produced is different each time. While the result appears to be random, after running this many times, it seems that the row selected is actually one of a small set of possible results (though that may be that I didn't run it enough to get the full effect of randomness).

    Too weird.

    😉

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • That's exactly the behavior I'm noticing: The order by newid() is working alright, the problem is further down the line where the concatenation happens (or does NOT happen).

  • It looks to me as though the original question of why only a single row is returned may be answered by an entry in the BOL under "Behavior Changes to Database Engine Features in SQL Server 2005" ms-help://MS.SQLCC.v9/MS.SQLSVR.v9.en/instsql9/html/65eaafa1-9e06-4264-b547-cbee8013c995.htm. Scroll down to the section headed "Transact-SQL" and see entry on "Built-in system functions".

    Both the NEWID() and RAND() functions are deliberately designed now to be invoked a single time in a call, even if referenced multiple times. In SQL 2000, each reference produced a different result as the function would be evaluated again for each reference.

    An interesting discussion on this topic was posted by Itzik Ben-Gan at http://www.sqlmag.com/Article/ArticleID/97032/sql_server_97032.html

  • Thanks for the info.

    This is where I still have an issue :

    Select order by newid() returns 95 rows

    Select @concat = .... order by newid() = 1 row????

    The query itself returns 95 rows, in random order as expected... then it just stops to work when you try to concatenate?!?!?!

  • I suspect that the optimizer is seeing you ask for a value for a scalar, and providing exactly that, a scalar, and I'm coming to that conlusion based on that post about a change in behavior for RAND() and NEWID(), as that change appears to make those functions entirely scalar in nature. Thus if you want to sort by such a creation, you'll have to force the generated value into a temp table in order to keep the optimizer from short-circuiting to a single scalar value.

    Steve

    (aka smunson)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • That doesn't make a whole lot of sens to me... if rand or newid() is called only once in the query, then how come the result is really random... and changes on every execute?

    That can't happen if it's called only once.

    Anyways, I'll let MS fix this one :P.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply