10 Top 1 rows chosen at random

Question

Post reply

10 Top 1 rows chosen at random

Andy in Pembs

Old Hand

Points: 395
More actions
August 1, 2007 at 4:45 am

#179955

Hi,
I have a table with 1000's of rows, and each row is one of 10 types.
What I want is a statement to return one row per type, but that the row is chosen at random, so each time I execute the row I'll get 10 rows, 1 per type, but each row will be different from the last time I run it
There must be a better way than inserting top 1 rows ordering by newid() into a temp table then returning the 10 rows as e result set..
Thanks!

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 1

Are you talking about PAGING?

You want 10 records at a time (one fo each type) on a "page"?

When you click "next" you want next 10 records (all again of different type)?

N 56°04'39.16"
E 12°55'05.25"

Andy in Pembs Old Hand Points: 395 More actions · Answer 2

Hi, No its not for paging.

It's for showing different films but one per genre on a web page..

Thanks,

Andy

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 3

How are you else going to check that some of the films you are going to show, not already has been shown?

N 56°04'39.16"
E 12°55'05.25"

Andy in Pembs Old Hand Points: 395 More actions · Answer 4

Well, each of the 10 items in each result set will have a different genre, so can't be shown twice in the same set of ten, and a film ere can only have one genre.

I'm not actually worried about whether it gets shown in successive result sets. Some genres may only have 10 films and others may have 1000. Plus anything random can potentially happen twice in a row anyway.

Hope that makes sense.

(Edited for sloppy typing )

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 5

-- Prepare sample data

CREATE TABLE #Movies

(

MovieID INT IDENTITY PRIMARY KEY,

GenreID INT

)

INSERT #Movies

(

GenreID

)

SELECT TOP 1000 1 + ABS(CHECKSUM(NEWID())) % 10

FROM master..syscolumns

-- Show the expected result

SELECT g.GenreID,

(SELECT TOP 1 m.MovieID FROM #Movies AS m WHERE m.GenreID = g.GenreID ORDER BY NEWID()) AS MovieID

FROM (

SELECT GenreID

FROM #Movies

GROUP BY GenreID

) AS g

ORDER BY g.GenreID

N 56°04'39.16"
E 12°55'05.25"

SQL ORACLE One Orange Chip Points: 27807 More actions · Answer 6

Our data in SQL Server may not be ordered based on the time to add them. If there is the ORDER clause in your queries, your retrieved data should be the same.

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 7

Very nicely done, Peter. Nasty fast, too!

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 8

SwePeso

SSC-Dedicated

Points: 39752

August 2, 2007 at 1:02 am

#724078

Thanks!

N 56°04'39.16"
E 12°55'05.25"

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 9

A relational database doesn't really do random well. Here are a couple of more straightforward solutions. If you want truly random results from one query to another -- good luck.

CREATE TABLE #Movies (
    MovieID INT IDENTITY PRIMARY KEY,
    GenreID INT,
    Title   varchar(10)
);

-- Insert the data in a haphazard order. The titles are in the form "Genre x, Title y"
INSERT into #Movies
    (GenreID, Title)
select 10,  'G10T1' union all
select 4,   'G4T2'  union all
select 2,   'G2T2'  union all
select 1,   'G1T3'  union all
select 6,   'G6T1'  union all
select 3,   'G3T5'  union all
select 8,   'G8T1'  union all
select 3,   'G3T4'  union all
select 3,   'G3T2'  union all
select 1,   'G1T2'  union all
select 4,   'G4T1'  union all
select 3,   'G3T1'  union all
select 5,   'G5T1'  union all
select 3,   'G3T3'  union all
select 1,   'G1T1'  union all
select 2,   'G2T1'  union all
select 7,   'G7T1'  union all
select 9,   'G9T1';

-- The first method. Not random, MIN() would return the first
-- and MAX() would return the last title in alphabetical order.
select m1.GenreID, max(m1.Title) Title
from   #Movies m1
group by m1.GenreID

-- The second method. More random as to the result, but the result
-- would be repeatable: you would get the same titles every time.
select top 1 GenreID, Title
from   #Movies
where  GenreID = 1
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 2
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 3
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 4
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 5
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 6
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 7
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 8
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 9
union
select top 1 GenreID, Title
from   #Movies
where  GenreID = 10

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

SwePeso SSC-Dedicated Points: 39752 More actions · Answer 10

Running Carr's suggestion reveals that the movies are always the same for every genre.

Why? Because there is a clustered index on MovieID. That means that TOP 1 always return the same MovieID, which defeats the purpose.

Here are the two execution plans side by side for comparison

Peso

|--Compute Scalar(DEFINE[m].[MovieID]=[m].[MovieID]))

|--Nested Loops(Left Outer Join, OUTER REFERENCES[#Movies].[GenreID]))

|--Sort(DISTINCT ORDER BY[#Movies].[GenreID] ASC))

| |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]))

|--Sort(TOP 1, ORDER BY[Expr1004] ASC))

|--Compute Scalar(DEFINE[m].[MovieID]=[m].[MovieID], [Expr1004]=newid()))

|--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8] AS [m]), WHERE[m].[GenreID]=[#Movies].[GenreID]))

Carr

|--Merge Join(Union)

| |--Merge Join(Union)

| | |--Merge Join(Union)

| | | |--Merge Join(Union)

| | | | | | | | |--Top(1)

| | | | | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=1))

| | | | | | | | |--Top(1)

| | | | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=2))

| | | | | | | |--Top(1)

| | | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=3))

| | | | | | |--Top(1)

| | | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=4))

| | | | | |--Top(1)

| | | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=5))

| | | | |--Top(1)

| | | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=6))

| | | |--Top(1)

| | | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=7))

| | |--Top(1)

| | |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=8))

| |--Top(1)

| |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=9))

|--Top(1)

|--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=10))

N 56°04'39.16"
E 12°55'05.25"

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 11

Well, good grief, this wasn't even past the proof of concept stage. I don't think OP was concerned at this point about performance. He seemed more interested in seeing if it could be done at all. If either quasi-random solution (and I pointed out the fact that each listing would be the same) was good enough, then we could have worked on the performance.

Find out the direction you want to go before you start paving the road.

BTW, here's another possible solution:

Create another table (call it GenreList for now) with just the info that you want to show. There will be only one entry for each genre.
When you add, delete, or update an entry into your movie table, update the GenreList entry for that genre to reflect the new entry. This is easily done with a trigger on the movie table (example below).
When you want to show your "random" samples, just SELECT * FROM GenreList!
For the purists out there, yes, this is still not quite a random list -- it always shows the latest movie entered or updated. But at least the titles change from time to time depending on how often they add/modify movies.

Here's what I threw together. And for the purists among you, I know the performance can be improved by indexes and I didn't define the relationship between the two tables. There has to be something left to the reader to do. The trigger could be improved also, I'm sure. This is just an example, guys, not production code. (Anyone who wants polished code, contact me. My rates are reasonble. )

create table Movies (
    MovieID int identity primary key,
    GenreID int,
    Title   varchar(10)
);

create table GenreList (
    GenreID int primary key,
    Title   varchar(10)
);

go
create trigger Movies_BI 
   on  Movies 
   after insert, delete, update
as 
  begin
    if @@RowCount = 0
        return; -- Well, just never mind then.

    set NoCount on;

    if exists (select * from deleted)
      begin
        -- We're either deleting or updating. Either way, remove the effected
        -- rows (if they exist) from the listing table.
        delete  from GenreList
        where   GenreID in (select GenreID from deleted); 
      end
    if exists (select * from inserted)  -- Inserting or modifying
      begin
        -- If this is an insert, there is the possibility one or more are the first insert
        -- for a genre. For the second or subsequent entries, the delete statement
        -- above will not have effected that genre's entry in the list table. So we
        -- have to update them.
        update  GenreList
        set     GenreList.Title = I.Title
        from    GenreList L 
        join    (
            select  I1.GenreID, Max(I1.Title) Title
            from    Inserted I1
            group by I1.GenreID
        ) I
            on I.GenreID = L.GenreID
        where L.GenreID in (select GenreID from GenreList);
        
        -- All that are left are first inserts for a genre, updates to existing genres
        -- or updates that change the genre. In any of these instances, there will be no
        -- entries in the list table so we now have to insert them.
        insert into GenreList
            (GenreID, Title)
            select  GenreID, Max(Title)
            from    inserted i
            where   i.GenreID not in (select GenreID from GenreList)
            group by GenreID;
      end--if
      
    -- If a movie that appears in the list table is deleted or the genre of a movie is changed,
    -- there remains a "gap" for the deleted/old genre because it is deleted from the listing 
    -- table but not replenished. Here we fix that. If there are gaps, insert a replacement.
    -- Note: some people are uncomfortable querying a mutating table. Fine, find another way or
    -- make a business rule for changing the genre of a movie: first delete existing entry then insert
    -- entry with new genre. You might even implement this recursively within the trigger.

    insert into GenreList
        (GenreID, Title)
        select  GenreID, Max(Title)
        from    movies m
        where   m.GenreID not in (select GenreID from GenreList)
        group by GenreID;

  end--Trigger
go

-- Insert the data in a haphazard order. The titles are in the form "Genre x, Title y"
insert into Movies
    (GenreID, Title)
select 10,  'G10T1' union all
select 4,   'G4T2'  union all
select 2,   'G2T2'  union all
select 1,   'G1T3'  union all
select 6,   'G6T1'  union all
select 3,   'G3T5'  union all
select 8,   'G8T1'  union all
select 3,   'G3T4'  union all
select 3,   'G3T2'  union all
select 1,   'G1T2'  union all
select 4,   'G4T1'  union all
select 3,   'G3T1'  union all
select 5,   'G5T1'  union all
select 3,   'G3T3'  union all
select 1,   'G1T1'  union all
select 2,   'G2T1'  union all
select 7,   'G7T1'  union all
select 9,   'G9T1';


select * from GenreList;

-- Create a new entry for an existing genre
insert into Movies
    (GenreID, Title)
values( 10, 'G10T2');

-- Create a new entry for a new genre
insert into Movies
    (GenreID, Title)
values( 11, 'G11T1');

-- Update the title of an entry
update Movies
set Title   = 'G3T-One'
where Title = 'G3T1';

select * from GenreList;

-- Update the title and genre of an entry (new genre exists)
update Movies
set Title   = 'G4T3',
    GenreID = 4
where Title = 'G3T-One';

-- Update the title and genre of an entry (new genre doesn't exist)
update Movies
set Title   = 'G12T1',
    GenreID = 12
where Title = 'G4T1';

select * from GenreList;

-- Update the title and genre of an entry that is NOT
-- one in the list
update Movies
set Title   = 'G3T4',
    GenreID = 3
where Title = 'G1T1';

select * from GenreList;

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 12

A relational database doesn't really do random well... If you want truly random results from one query to another... .

Tomm,

I just gotta ask why you say either of those things or why you think Peter's sweet bit of code doesn't do it?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Tomm Carr SSCertifiable Points: 5426 More actions · Answer 13

I didn't say random access couldn't be done, I just said that an RDMS doesn't really lend itself to random access. Am I wrong?

And there is nothing intrinsically wrong with Peter's code. It fulfills all the requirements. But it puts the work in the query.

In almost every database, there will be relatively few Inserts and Updates
Overwhelmingly, most database hits will be through queries.
If a requirement can be satisfied in the insert/update side or the query side, look more favorably on the insert/update side. Even if it is several times less efficient, it is executed far less often.

This "rule" has decreased in importance over the years as increases in memory and cpu speeds has far outpaced disk access speeds. But it is still something I like to keep in mind because, depending on -- well, you know, many things -- it could still be relevant.

So I provided an alternate "solution." Solution is quoted because it did not quite match the requirements. It was quasi-random in that it wasn't random at all but the pattern was hidden from the user and the result set could remain static between queries. So if OP's requirements demanded a different result set every time the query is executed, there is no help for it -- it is Peter's solution (or one like it) or nothing. Generally, however, I find few people complain that they have been given too many solutions to a problem.

In thinking further about it, I have one question. I haven't had much experience with quids so I know almost nothing about the particulars of using them. But it seems to me there would be a certain amount of overhead, particularly locking, that goes into guaranteeing uniqueness. So, in a very busy application, like Amazon or Google, where this query might possibly be called millions of times in a day or so, or peak hit rates of thousands of times per second, does/could calling NewID() ten times for every query create a problem?

Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 14

In light of Peter's solution, I'd have to say "Yeah, you're wrong" Also, what's wrong with putting "the work in the query"? Isn't that where most of the SQL work is done?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)