Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Accessing results in a random order: Towards the automated Thank-you letter

 Most of the time, you want to have a SQL result in a particular order. Occasionally you just don't care. Just now and then,you actually want to shake the dice and get your data back in a random order. I quite often like to get a random order so as to introduce a sort of  human randomness into strings. OK, I'm slightly unusual in my liking for databases and websites that produce emails and messages that seem to be hand-crafted, but there are several real-life commercial applications where a certain randomness in messages and strings is an advantage. When I was a child, I was sat down at a table by my father and made to write thank-you letters on Boxing Day, and whereas my siblings fretted, and chewed their pencils, I constructed several banks of phrases that would pop into thank-you letters. I'm afraid to confess that, in those far-off days, computers were referred to as 'Electronic Brains' and one had to travel to the metropolis to see one. I therefore used spinning tops and circular pieces of paper, with a nice Aunt-friendly phrase on each pie-slice. Eventually, of course, it occurred to me that all I needed was a conventional list and a dice.

Here is a SQL Server technique very much in the spirit of the dice. Obviously, the advantage of using a sequenced list from a table, over just picking from a list over and over again, via random numbers, is that you don't pick the same phrase twice.

Dear Aunt Matilda

Thank you for the socks. I will always cherish them and I will always cherish them.

isn't as endearing as

Dear Aunt Matilda

Thank you for the socks. I will always cherish them and they will always remind me of your sweet generosity to an undeserving nephew.

(Since writing this, Kristofer Andersson has come up with a much neater solution in a comment below )

The reason that I like using the CROSS APPLY technique is that you don't need to have any unique index in the result. Before you curl your lip in disdain, be clear that a derived table may indeed not have one, and so it is rather nice not to worry about stuffing in an identity field or whatever.

Here is a little routine that gives you a few excuses as to why your project is failing. (Data is taken from an old article I did way back)

Because of the assumption that functions are deterministic, you have to fool both the parser and the query optimizer in order to do this. The former is hoodwinked by slipping in a RAND() function embedded in a view, and the latter by passing a dummy parameter that is computed. It is easier to show via an example. Play about with it and I'd love to hear of an improvement to it.


/* we create our utility view which hoodwinks the function into accepting a random number. (note, this can be counter-productive */
IF OBJECT_ID (N'vRandomNumber ') IS NOT NULL
  
DROP VIEW vRandomNumber
GO
CREATE VIEW vRandomNumber
      
AS
SELECT
RAND() AS RandomNumber
GO
      
/*Now a very simple table-valued function that simply returns a row with a single column of a random number */
IF OBJECT_ID (N'Randomly') IS NOT NULL
  
DROP FUNCTION dbo.Randomly
GO
CREATE FUNCTION dbo.Randomly(@Multiplier INT,@number INT)
RETURNS @random TABLE
  
(
        
number INT NOT NULL
  )
      
AS
  
-- body of the function
  
BEGIN
  INSERT INTO
@random (number)
    
SELECT randomnumber*@multiplier
        
AS [random]
    
FROM vRandomnumber
  
RETURN
  END
GO
      
/* now let's create some data. We'll make a table of excuses and pick one row out of the table at random */
DROP TABLE #ExcusesForProjectFailure
GO
CREATE TABLE #ExcusesForProjectFailure
  
(Excuse VARCHAR(255) )
INSERT INTO #ExcusesForProjectFailure (Excuse)
           
SELECT   'unclear objectives'
  
UNION ALL SELECT   'changing objectives'
  
UNION ALL SELECT   'lack of developer training'
  
UNION ALL SELECT   'insufficient resources'
  
UNION ALL SELECT   'impossible schedules'
  
UNION ALL SELECT   'unrealistic expectations'
  
UNION ALL SELECT   'unclear roles and responsibilities'
  
UNION ALL SELECT   'Insufficient feedback to stakeholders'
  
UNION ALL SELECT   'corporate  politics'
  
UNION ALL SELECT   'poor communication'
  
UNION ALL SELECT   'personnel turnover'
  
UNION ALL SELECT   'changing technology'
  
UNION ALL SELECT   'constraining rules and regulations'
  
UNION ALL SELECT   'lack of sponsorship'
  
UNION ALL SELECT   'poor planning'
  
UNION ALL SELECT   'unclear goals and objectives'


DECLARE @FinalExcuse VARCHAR(2000)
SELECT TOP 3 @FinalExcuse=COALESCE(@finalExcuse+', ',
                                  
'It wasn''t my fault! It was ')
                          +
excuse
  
FROM #ExcusesForProjectFailure
      
CROSS apply  dbo.randomly (1000,LEN(excuse))
  
ORDER BY number
SELECT @finalexcuse
      
/* if you add the second parameter that is never used, (len(excuse)) this fools the query optimiser into calling the function for every row. Try rewriting the code without this extra parameter just to see what happens, and compare the execution plans!*/
      

 


It wasn't my fault! It was changing objectives, Insufficient feedback to stakeholders, unclear roles and responsibilities

(1 row(s) affected)


It wasn't my fault! It was lack of developer training, insufficient resources, unrealistic expectations

It wasn't my fault! It was poor planning, impossible schedules, changing technology


---and so on '''

 

Comments

Posted by kristofera on 16 March 2009

order by substring(convert(varchar(40), newid()), 1, 8)

Posted by Phil Factor on 16 March 2009

Great one, Kristofer! It works a treat.


DECLARE @FinalExcuse VARCHAR(2000)
SELECT TOP 3 @FinalExcuse=COALESCE(@finalExcuse+', ',
                                  
'It wasn''t my fault! It was ')
                          +
excuse
FROM #excusesforProjectFailure
  ORDER BY  SUBSTRING(CONVERT(VARCHAR(40), NEWID()), 1, 8)

The use of NEWID() for randomising the order of results has, to my shame, been documented in BOL and been discussed since at least 2002. ( Using NEWID to Randomly Sort Records  - http://www.sqlteam.com/article/using-newid-to-randomly-sort-records  )

 It works because NEWID(), unlike RAND(), is executed on every row. RAND() is executed once in a query, unless it is seeded on every row, which unfortunately then gives a distribution that isn't random!  

The technique I illustrated for using a view with a RAND() in it,  and sneaking it into a function is still useful in certain circumstances (e.g. returning a random string) as you can't use NEWID() in a function. It gets detected and spat back with the message
    Invalid use of a side-effecting operator 'newid' within a function.
The technique I've illustrated with the CROSS APPLY is also a quick 'non-cursor' way of seeding any existing table that has a primary key with a true 'pseudorandom' number so you can use sampling stats on it.
There is a fascinating article on MSDN by Brian Connolly, Random Sampling in T-SQL http://msdn.microsoft.com/en-us/library/aa175776.aspx , originally published in the March 2004 issue of SQL Server Professional. which shows a more serious use for returning results in a random order, and illustrates some of the pitfalls.

 

Posted by Steve Jones on 16 March 2009

Very nice!

Posted by Rohit Madhok on 18 March 2009
Posted by dellis403 on 18 March 2009

As a point of interest, this doesn't work the same on SQL Server 2000 and SQL Server 2005.  On 2000 you only get a single excuse reason returned, and on 2005 you get all 3 reasons.  Looks like Microsoft changed how setting a variable in a SELECT statement works between the two versions.  

Posted by Tom Garth on 18 March 2009

Too cool - as always!

Thanks Phil

Note: The UNION ALLs aren't required in SQL Server 2008.

Posted by David Todd on 19 March 2009

Hi,

Just a couple of things:

I can confirm dellis403's comment about SQL 2000 returning only a single excuse;

Why do the substring and convert on newid()? Why not just order by newid()?

Cheers

Posted by Phil Factor on 20 March 2009

Sadly my SQL 2000 machine is broken, so I can't test it. I feel sure there is a simple way round as I used the view-in-function technique a lot in SQL 2000.

As regards David's question, I assumed that it would be a performance issue, as ORDER BY NEWID() works fine. Can anyone test this?

Posted by Tom.Thomson on 21 March 2009

SQL 2000 seems quite bizarre on this.  Without the order by clause it picks 3 excuses (always the same three, so not useful for the purpose of this code); with an order byclause that has just a column name (more generally, a comma separated list of column names decorated with asc and desc) it still picks three (again deterministic so not useful in this example) but with an order by clause which contains any function call at all *whether or not deterministic) it picks only 1 excuse.  

So to get the right effect you have to separate the concqatenation operation from the selection - for example by selecting 3 excuses into a table variable using the random ordering and then concatenating from the table variable without the order by clause.

I haven't seen this documented, and it seems to me to be a big inconsistency - dare I say a bug?

Posted by KristoferA on 23 March 2009

"order by NewID()" indeed works fine. The reason why I used substring in my first comment can be divided into three sub-reasons:

1) Making the internal sort list shorter. It makes no difference in this case but if you're randomizing a large number of records it should save a few bytes...

2) Something in the back of my head tells me that not all versions of SQL Server support ordering by uniqueidentifiers. Not sure what version but likely SQL 2000 (SP n?) or 7.x.

3) Not sure what time I wrote the comment but if it was before my morning coffee then I was tired. If at night maybe it was preceded by a couple of pints. :)

Posted by Phil Factor on 3 September 2009

Of course you can't put NewID() in a function, whereas you can use the view trick that I started off the blog-post with!

Msg 443, Level 16, State 1, Procedure MyFunction , Line 9

Invalid use of a side-effecting operator 'newid' within a function.

Leave a Comment

Please register or log in to leave a comment.