get set of random rows with distinct values

  • I have an interesting little T-SQL problem that has me beaten.

    I want to retrieve a small set of randomly selected rows from a table but also ensure that a particular column (AUTHORID) is unique (not duplicated) within that set.

    SELECT TOP 6 WORKID, AUTHORID FROM dbo.LITERARYWORKS

    ORDER BY CHECKSUM(NEWID())

    works fine at getting a random selection, but of course does not prevent duplicates on AUTHORID. And I can't use DISTINCT without including WORKID and AUTHORID in the ORDER BY - and in any case the rows would always be unique if WORKID is included. I believe one way of achieving the effect of a 'DISTINCT' on a single column while returning other columns as well is by using GROUP BY, but I can't see how to use that in combination with ORDER BY.

    Any ideas?

    Jon

  • by introducing a row_number function that partitions by the two columns, you cna find unique rows; fromt here, it's just the same query you had before, but hitting a subquery isntead of the raw table:

    SELECT

    TOP 6

    WORKID,

    AUTHORID

    FROM (SELECT

    ROW_NUMBER() OVER(PARTITION BY WORKID, AUTHORID ORDER BY WORKID, AUTHORID) AS RW,

    WORKID,

    AUTHORID

    FROM dbo.LITERARYWORKS) MyAlias

    WHERE RW = 1--finds Uniqueness across WORKID, AUTHORID

    ORDER BY

    CHECKSUM(NEWID())

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Works a treat - many thanks!

  • A randomly-selected work from a number of randomly-selected authors:

    SELECT

    WORKID,

    AUTHORID

    FROM (

    SELECT TOP 10

    AUTHORID

    FROM dbo.LITERARYWORKS

    GROUP BY MASTNUM

    ORDER BY NEWID()

    ) at

    CROSS APPLY (

    SELECT TOP 1

    WORKID

    FROM dbo.LITERARYWORKS

    WHERE AUTHORID = at.AUTHORID

    ORDER BY NEWID()

    ) MyAlias

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Many thanks. I actually prefer this for my purposes as it will always return the required number of items (provided that there is at least that number of authors with at least 1 work each) whereas the other approach above is arguably more random but can return less than the required number of items if more than 1 work for the same author is initially selected.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply