Most of the time, you want to have a SQL result in a particular order. Occasionally you just don't care. Just now and then,you actually want to shake the dice and get your data back in a random order. I quite often like to get a random order so as to introduce a sort of human randomness into strings. OK, I'm slightly unusual in my liking for databases and websites that produce emails and messages that seem to be hand-crafted, but there are several real-life commercial applications where a certain randomness in messages and strings is an advantage. When I was a child, I was sat down at a table by my father and made to write thank-you letters on Boxing Day, and whereas my siblings fretted, and chewed their pencils, I constructed several banks of phrases that would pop into thank-you letters. I'm afraid to confess that, in those far-off days, computers were referred to as 'Electronic Brains' and one had to travel to the metropolis to see one. I therefore used spinning tops and circular pieces of paper, with a nice Aunt-friendly phrase on each pie-slice. Eventually, of course, it occurred to me that all I needed was a conventional list and a dice.
Here is a SQL Server technique very much in the spirit of the dice. Obviously, the advantage of using a sequenced list from a table, over just picking from a list over and over again, via random numbers, is that you don't pick the same phrase twice.
Dear Aunt MatildaThank you for the socks. I will always cherish them and I will always cherish them.
Dear Aunt Matilda
Thank you for the socks. I will always cherish them and I will always cherish them.
isn't as endearing as
Dear Aunt MatildaThank you for the socks. I will always cherish them and they will always remind me of your sweet generosity to an undeserving nephew.
Thank you for the socks. I will always cherish them and they will always remind me of your sweet generosity to an undeserving nephew.
(Since writing this, Kristofer Andersson has come up with a much neater solution in a comment below )
The reason that I like using the CROSS APPLY technique is that you don't need to have any unique index in the result. Before you curl your lip in disdain, be clear that a derived table may indeed not have one, and so it is rather nice not to worry about stuffing in an identity field or whatever.
Here is a little routine that gives you a few excuses as to why your project is failing. (Data is taken from an old article I did way back)
Because of the assumption that functions are deterministic, you have to fool both the parser and the query optimizer in order to do this. The former is hoodwinked by slipping in a RAND() function embedded in a view, and the latter by passing a dummy parameter that is computed. It is easier to show via an example. Play about with it and I'd love to hear of an improvement to it.
/* we create our utility view which hoodwinks the function into accepting a random number. (note, this can be counter-productive */IF OBJECT_ID (N'vRandomNumber ') IS NOT NULL DROP VIEW vRandomNumberGOCREATE VIEW vRandomNumber ASSELECT RAND() AS RandomNumberGO /*Now a very simple table-valued function that simply returns a row with a single column of a random number */IF OBJECT_ID (N'Randomly') IS NOT NULL DROP FUNCTION dbo.RandomlyGOCREATE FUNCTION dbo.Randomly(@Multiplier INT,@number INT)RETURNS @random TABLE ( number INT NOT NULL ) AS -- body of the function BEGIN INSERT INTO @random (number) SELECT randomnumber*@multiplier AS [random] FROM vRandomnumber RETURN ENDGO /* now let's create some data. We'll make a table of excuses and pick one row out of the table at random */DROP TABLE #ExcusesForProjectFailureGOCREATE TABLE #ExcusesForProjectFailure (Excuse VARCHAR(255) )INSERT INTO #ExcusesForProjectFailure (Excuse) SELECT 'unclear objectives' UNION ALL SELECT 'changing objectives' UNION ALL SELECT 'lack of developer training' UNION ALL SELECT 'insufficient resources' UNION ALL SELECT 'impossible schedules' UNION ALL SELECT 'unrealistic expectations' UNION ALL SELECT 'unclear roles and responsibilities' UNION ALL SELECT 'Insufficient feedback to stakeholders' UNION ALL SELECT 'corporate politics' UNION ALL SELECT 'poor communication' UNION ALL SELECT 'personnel turnover' UNION ALL SELECT 'changing technology' UNION ALL SELECT 'constraining rules and regulations' UNION ALL SELECT 'lack of sponsorship' UNION ALL SELECT 'poor planning' UNION ALL SELECT 'unclear goals and objectives'
DECLARE @FinalExcuse VARCHAR(2000)SELECT TOP 3 @FinalExcuse=COALESCE(@finalExcuse+', ', 'It wasn''t my fault! It was ') +excuse FROM #ExcusesForProjectFailure CROSS apply dbo.randomly (1000,LEN(excuse)) ORDER BY numberSELECT @finalexcuse /* if you add the second parameter that is never used, (len(excuse)) this fools the query optimiser into calling the function for every row. Try rewriting the code without this extra parameter just to see what happens, and compare the execution plans!*/
It wasn't my fault! It was changing objectives, Insufficient feedback to stakeholders, unclear roles and responsibilities
(1 row(s) affected)
It wasn't my fault! It was lack of developer training, insufficient resources, unrealistic expectations
It wasn't my fault! It was poor planning, impossible schedules, changing technology
---and so on '''
order by substring(convert(varchar(40), newid()), 1, 8)
Great one, Kristofer! It works a treat.
DECLARE @FinalExcuse VARCHAR(2000)SELECT TOP 3 @FinalExcuse=COALESCE(@finalExcuse+', ', 'It wasn''t my fault! It was ') +excuse FROM #excusesforProjectFailure ORDER BY SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 8)
The use of NEWID() for randomising the order of results has, to my shame, been documented in BOL and been discussed since at least 2002. ( Using NEWID to Randomly Sort Records - http://www.sqlteam.com/article/using-newid-to-randomly-sort-records )
It works because NEWID(), unlike RAND(), is executed on every row. RAND() is executed once in a query, unless it is seeded on every row, which unfortunately then gives a distribution that isn't random!
The technique I illustrated for using a view with a RAND() in it, and sneaking it into a function is still useful in certain circumstances (e.g. returning a random string) as you can't use NEWID() in a function. It gets detected and spat back with the message Invalid use of a side-effecting operator 'newid' within a function.The technique I've illustrated with the CROSS APPLY is also a quick 'non-cursor' way of seeding any existing table that has a primary key with a true 'pseudorandom' number so you can use sampling stats on it.There is a fascinating article on MSDN by Brian Connolly, Random Sampling in T-SQL http://msdn.microsoft.com/en-us/library/aa175776.aspx , originally published in the March 2004 issue of SQL Server Professional. which shows a more serious use for returning results in a random order, and illustrates some of the pitfalls.
Very nice!
blogs.mssqltips.com/.../2215.aspx
As a point of interest, this doesn't work the same on SQL Server 2000 and SQL Server 2005. On 2000 you only get a single excuse reason returned, and on 2005 you get all 3 reasons. Looks like Microsoft changed how setting a variable in a SELECT statement works between the two versions.
Too cool - as always!
Thanks Phil
Note: The UNION ALLs aren't required in SQL Server 2008.
Hi,
Just a couple of things:
I can confirm dellis403's comment about SQL 2000 returning only a single excuse;
Why do the substring and convert on newid()? Why not just order by newid()?
Cheers
Sadly my SQL 2000 machine is broken, so I can't test it. I feel sure there is a simple way round as I used the view-in-function technique a lot in SQL 2000.
As regards David's question, I assumed that it would be a performance issue, as ORDER BY NEWID() works fine. Can anyone test this?
SQL 2000 seems quite bizarre on this. Without the order by clause it picks 3 excuses (always the same three, so not useful for the purpose of this code); with an order byclause that has just a column name (more generally, a comma separated list of column names decorated with asc and desc) it still picks three (again deterministic so not useful in this example) but with an order by clause which contains any function call at all *whether or not deterministic) it picks only 1 excuse.
So to get the right effect you have to separate the concqatenation operation from the selection - for example by selecting 3 excuses into a table variable using the random ordering and then concatenating from the table variable without the order by clause.
I haven't seen this documented, and it seems to me to be a big inconsistency - dare I say a bug?
"order by NewID()" indeed works fine. The reason why I used substring in my first comment can be divided into three sub-reasons:
1) Making the internal sort list shorter. It makes no difference in this case but if you're randomizing a large number of records it should save a few bytes...
2) Something in the back of my head tells me that not all versions of SQL Server support ordering by uniqueidentifiers. Not sure what version but likely SQL 2000 (SP n?) or 7.x.
3) Not sure what time I wrote the comment but if it was before my morning coffee then I was tired. If at night maybe it was preceded by a couple of pints. :)
Of course you can't put NewID() in a function, whereas you can use the view trick that I started off the blog-post with!
Msg 443, Level 16, State 1, Procedure MyFunction , Line 9
Invalid use of a side-effecting operator 'newid' within a function.