get set of random rows with distinct values

Question

get set of random rows with distinct values

Jon Stranger

SSC Veteran

Points: 273
More actions
June 17, 2013 at 4:58 am

#295361

I have an interesting little T-SQL problem that has me beaten.
I want to retrieve a small set of randomly selected rows from a table but also ensure that a particular column (AUTHORID) is unique (not duplicated) within that set.
SELECT TOP 6 WORKID, AUTHORID FROM dbo.LITERARYWORKS
ORDER BY CHECKSUM(NEWID())
works fine at getting a random selection, but of course does not prevent duplicates on AUTHORID. And I can't use DISTINCT without including WORKID and AUTHORID in the ORDER BY - and in any case the rows would always be unique if WORKID is included. I believe one way of achieving the effect of a 'DISTINCT' on a single column while returning other columns as well is by using GROUP BY, but I can't see how to use that in combination with ORDER BY.
Any ideas?
Jon

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323504 More actions · Answer 1

by introducing a row_number function that partitions by the two columns, you cna find unique rows; fromt here, it's just the same query you had before, but hitting a subquery isntead of the raw table:

SELECT

TOP 6

WORKID,

AUTHORID

FROM (SELECT

ROW_NUMBER() OVER(PARTITION BY WORKID, AUTHORID ORDER BY WORKID, AUTHORID) AS RW,

WORKID,

AUTHORID

FROM dbo.LITERARYWORKS) MyAlias

WHERE RW = 1--finds Uniqueness across WORKID, AUTHORID

ORDER BY

CHECKSUM(NEWID())

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

Jon Stranger SSC Veteran Points: 273 More actions · Answer 2

Jon Stranger

SSC Veteran

Points: 273

June 17, 2013 at 5:54 am

#1624551

Works a treat - many thanks!

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

A randomly-selected work from a number of randomly-selected authors:

SELECT

WORKID,

AUTHORID

FROM (

SELECT TOP 10

AUTHORID

FROM dbo.LITERARYWORKS

GROUP BY MASTNUM

ORDER BY NEWID()

) at

CROSS APPLY (

SELECT TOP 1

WORKID

FROM dbo.LITERARYWORKS

WHERE AUTHORID = at.AUTHORID

ORDER BY NEWID()

) MyAlias

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jon Stranger SSC Veteran Points: 273 More actions · Answer 4

Many thanks. I actually prefer this for my purposes as it will always return the required number of items (provided that there is at least that number of authors with at least 1 work each) whereas the other approach above is arguably more random but can return less than the required number of items if more than 1 work for the same author is initially selected.