10 Top 1 rows chosen at random

  • Hi,

    I have a table with 1000's of rows, and each row is one of 10 types.

    What I want is a statement to return one row per type, but that the row is chosen at random, so each time I execute the row I'll get 10 rows, 1 per type, but each row will be different from the last time I run it

    There must be a better way than inserting top 1 rows ordering by newid() into a temp table then returning the 10 rows as e result set..

    Thanks!

  • Are you talking about PAGING?

    You want 10 records at a time (one fo each type) on a "page"?

    When you click "next" you want next 10 records (all again of different type)?

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Hi, No its not for paging.

    It's for showing different films but one per genre on a web page..

    Thanks,

    Andy

  • How are you else going to check that some of the films you are going to show, not already has been shown?

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Well, each of the 10 items in each result set will have a different genre, so can't be shown twice in the same set of ten, and a film ere can only have one genre.

    I'm not actually worried about whether it gets shown in successive result sets. Some genres may only have 10 films and others may have 1000. Plus anything random can potentially happen twice in a row anyway.

    Hope that makes sense.

    (Edited for sloppy typing )

  • -- Prepare sample data

    CREATE TABLE #Movies

    (

    MovieID INT IDENTITY PRIMARY KEY,

    GenreID INT

    )

    INSERT #Movies

    (

    GenreID

    )

    SELECT TOP 1000 1 + ABS(CHECKSUM(NEWID())) % 10

    FROM master..syscolumns

    -- Show the expected result

    SELECT g.GenreID,

    (SELECT TOP 1 m.MovieID FROM #Movies AS m WHERE m.GenreID = g.GenreID ORDER BY NEWID()) AS MovieID

    FROM (

    SELECT GenreID

    FROM #Movies

    GROUP BY GenreID

    ) AS g

    ORDER BY g.GenreID

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Our data in SQL Server may not be ordered based on the time to add them. If there is the ORDER clause in your queries, your retrieved data should be the same.

  • Very nicely done, Peter.  Nasty fast, too!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks!

     


    N 56°04'39.16"
    E 12°55'05.25"

  • A relational database doesn't really do random well. Here are a couple of more straightforward solutions. If you want truly random results from one query to another -- good luck.

    CREATE TABLE #Movies (
        MovieID INT IDENTITY PRIMARY KEY,
        GenreID INT,
        Title   varchar(10)
    );
    
    -- Insert the data in a haphazard order. The titles are in the form "Genre x, Title y"
    INSERT into #Movies
        (GenreID, Title)
    select 10,  'G10T1' union all
    select 4,   'G4T2'  union all
    select 2,   'G2T2'  union all
    select 1,   'G1T3'  union all
    select 6,   'G6T1'  union all
    select 3,   'G3T5'  union all
    select 8,   'G8T1'  union all
    select 3,   'G3T4'  union all
    select 3,   'G3T2'  union all
    select 1,   'G1T2'  union all
    select 4,   'G4T1'  union all
    select 3,   'G3T1'  union all
    select 5,   'G5T1'  union all
    select 3,   'G3T3'  union all
    select 1,   'G1T1'  union all
    select 2,   'G2T1'  union all
    select 7,   'G7T1'  union all
    select 9,   'G9T1';
    
    -- The first method. Not random, MIN() would return the first
    -- and MAX() would return the last title in alphabetical order.
    select m1.GenreID, max(m1.Title) Title
    from   #Movies m1
    group by m1.GenreID
    
    -- The second method. More random as to the result, but the result
    -- would be repeatable: you would get the same titles every time.
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 1
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 2
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 3
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 4
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 5
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 6
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 7
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 8
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 9
    union
    select top 1 GenreID, Title
    from   #Movies
    where  GenreID = 10

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Running Carr's suggestion reveals that the movies are always the same for every genre.

    Why? Because there is a clustered index on MovieID. That means that TOP 1 always return the same MovieID, which defeats the purpose.

    Here are the two execution plans side by side for comparison

    Peso

      |--Compute Scalar(DEFINE[m].[MovieID]=[m].[MovieID]))

           |--Nested Loops(Left Outer Join, OUTER REFERENCES[#Movies].[GenreID]))

                |--Sort(DISTINCT ORDER BY[#Movies].[GenreID] ASC))

                |    |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]))

                |--Sort(TOP 1, ORDER BY[Expr1004] ASC))

                     |--Compute Scalar(DEFINE[m].[MovieID]=[m].[MovieID], [Expr1004]=newid()))

                          |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8] AS [m]), WHERE[m].[GenreID]=[#Movies].[GenreID]))

     

    Carr

      |--Merge Join(Union)

           |--Merge Join(Union)

           |    |--Merge Join(Union)

           |    |    |--Merge Join(Union)

           |    |    |    |--Merge Join(Union)

           |    |    |    |    |--Merge Join(Union)

           |    |    |    |    |    |--Merge Join(Union)

           |    |    |    |    |    |    |--Merge Join(Union)

           |    |    |    |    |    |    |    |--Merge Join(Union)

           |    |    |    |    |    |    |    |    |--Top(1)

           |    |    |    |    |    |    |    |    |    |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=1))

           |    |    |    |    |    |    |    |    |--Top(1)

           |    |    |    |    |    |    |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=2))

           |    |    |    |    |    |    |    |--Top(1)

           |    |    |    |    |    |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=3))

           |    |    |    |    |    |    |--Top(1)

           |    |    |    |    |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=4))

           |    |    |    |    |    |--Top(1)

           |    |    |    |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=5))

           |    |    |    |    |--Top(1)

           |    |    |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=6))

           |    |    |    |--Top(1)

           |    |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=7))

           |    |    |--Top(1)

           |    |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=8))

           |    |--Top(1)

           |         |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=9))

           |--Top(1)

                |--Clustered Index Scan(OBJECT[tempdb].[dbo].[#Movies_____________________________________________________________________________________________________________00000000303D].[PK__#Movies__18AB59A8]), WHERE[#Movies].[GenreID]=10))

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Well, good grief, this wasn't even past the proof of concept stage. I don't think OP was concerned at this point about performance. He seemed more interested in seeing if it could be done at all. If either quasi-random solution (and I pointed out the fact that each listing would be the same) was good enough, then we could have worked on the performance.

    Find out the direction you want to go before you start paving the road.

    BTW, here's another possible solution:

    • Create another table (call it GenreList for now) with just the info that you want to show. There will be only one entry for each genre.
    • When you add, delete, or update an entry into your movie table, update the GenreList entry for that genre to reflect the new entry. This is easily done with a trigger on the movie table (example below).
    • When you want to show your "random" samples, just SELECT * FROM GenreList!
    • For the purists out there, yes, this is still not quite a random list -- it always shows the latest movie entered or updated. But at least the titles change from time to time depending on how often they add/modify movies.

    Here's what I threw together. And for the purists among you, I know the performance can be improved by indexes and I didn't define the relationship between the two tables. There has to be something left to the reader to do. The trigger could be improved also, I'm sure. This is just an example, guys, not production code. (Anyone who wants polished code, contact me. My rates are reasonble. )

    create table Movies (
        MovieID int identity primary key,
        GenreID int,
        Title   varchar(10)
    );
    
    create table GenreList (
        GenreID int primary key,
        Title   varchar(10)
    );
    
    go
    create trigger Movies_BI 
       on  Movies 
       after insert, delete, update
    as 
      begin
        if @@RowCount = 0
            return; -- Well, just never mind then.
    
        set NoCount on;
    
        if exists (select * from deleted)
          begin
            -- We're either deleting or updating. Either way, remove the effected
            -- rows (if they exist) from the listing table.
            delete  from GenreList
            where   GenreID in (select GenreID from deleted); 
          end
        if exists (select * from inserted)  -- Inserting or modifying
          begin
            -- If this is an insert, there is the possibility one or more are the first insert
            -- for a genre. For the second or subsequent entries, the delete statement
            -- above will not have effected that genre's entry in the list table. So we
            -- have to update them.
            update  GenreList
            set     GenreList.Title = I.Title
            from    GenreList L 
            join    (
                select  I1.GenreID, Max(I1.Title) Title
                from    Inserted I1
                group by I1.GenreID
            ) I
                on I.GenreID = L.GenreID
            where L.GenreID in (select GenreID from GenreList);
            
            -- All that are left are first inserts for a genre, updates to existing genres
            -- or updates that change the genre. In any of these instances, there will be no
            -- entries in the list table so we now have to insert them.
            insert into GenreList
                (GenreID, Title)
                select  GenreID, Max(Title)
                from    inserted i
                where   i.GenreID not in (select GenreID from GenreList)
                group by GenreID;
          end--if
          
        -- If a movie that appears in the list table is deleted or the genre of a movie is changed,
        -- there remains a "gap" for the deleted/old genre because it is deleted from the listing 
        -- table but not replenished. Here we fix that. If there are gaps, insert a replacement.
        -- Note: some people are uncomfortable querying a mutating table. Fine, find another way or
        -- make a business rule for changing the genre of a movie: first delete existing entry then insert
        -- entry with new genre. You might even implement this recursively within the trigger.
    
        insert into GenreList
            (GenreID, Title)
            select  GenreID, Max(Title)
            from    movies m
            where   m.GenreID not in (select GenreID from GenreList)
            group by GenreID;
    
      end--Trigger
    go
    
    -- Insert the data in a haphazard order. The titles are in the form "Genre x, Title y"
    insert into Movies
        (GenreID, Title)
    select 10,  'G10T1' union all
    select 4,   'G4T2'  union all
    select 2,   'G2T2'  union all
    select 1,   'G1T3'  union all
    select 6,   'G6T1'  union all
    select 3,   'G3T5'  union all
    select 8,   'G8T1'  union all
    select 3,   'G3T4'  union all
    select 3,   'G3T2'  union all
    select 1,   'G1T2'  union all
    select 4,   'G4T1'  union all
    select 3,   'G3T1'  union all
    select 5,   'G5T1'  union all
    select 3,   'G3T3'  union all
    select 1,   'G1T1'  union all
    select 2,   'G2T1'  union all
    select 7,   'G7T1'  union all
    select 9,   'G9T1';
    
    
    select * from GenreList;
    
    -- Create a new entry for an existing genre
    insert into Movies
        (GenreID, Title)
    values( 10, 'G10T2');
    
    -- Create a new entry for a new genre
    insert into Movies
        (GenreID, Title)
    values( 11, 'G11T1');
    
    -- Update the title of an entry
    update Movies
    set Title   = 'G3T-One'
    where Title = 'G3T1';
    
    select * from GenreList;
    
    -- Update the title and genre of an entry (new genre exists)
    update Movies
    set Title   = 'G4T3',
        GenreID = 4
    where Title = 'G3T-One';
    
    -- Update the title and genre of an entry (new genre doesn't exist)
    update Movies
    set Title   = 'G12T1',
        GenreID = 12
    where Title = 'G4T1';
    
    select * from GenreList;
    
    -- Update the title and genre of an entry that is NOT
    -- one in the list
    update Movies
    set Title   = 'G3T4',
        GenreID = 3
    where Title = 'G1T1';
    
    select * from GenreList;
    

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • quoteA relational database doesn't really do random well...  If you want truly random results from one query to another... .

    Tomm,

    I just gotta ask why you say either of those things or why you think Peter's sweet bit of code doesn't do it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I didn't say random access couldn't be done, I just said that an RDMS doesn't really lend itself to random access. Am I wrong?

    And there is nothing intrinsically wrong with Peter's code. It fulfills all the requirements. But it puts the work in the query.

    • In almost every database, there will be relatively few Inserts and Updates
    • Overwhelmingly, most database hits will be through queries.
    • If a requirement can be satisfied in the insert/update side or the query side, look more favorably on the insert/update side. Even if it is several times less efficient, it is executed far less often.

    This "rule" has decreased in importance over the years as increases in memory and cpu speeds has far outpaced disk access speeds. But it is still something I like to keep in mind because, depending on -- well, you know, many things -- it could still be relevant.

    So I provided an alternate "solution." Solution is quoted because it did not quite match the requirements. It was quasi-random in that it wasn't random at all but the pattern was hidden from the user and the result set could remain static between queries. So if OP's requirements demanded a different result set every time the query is executed, there is no help for it -- it is Peter's solution (or one like it) or nothing. Generally, however, I find few people complain that they have been given too many solutions to a problem.

    In thinking further about it, I have one question. I haven't had much experience with quids so I know almost nothing about the particulars of using them. But it seems to me there would be a certain amount of overhead, particularly locking, that goes into guaranteeing uniqueness. So, in a very busy application, like Amazon or Google, where this query might possibly be called millions of times in a day or so, or peak hit rates of thousands of times per second, does/could calling NewID() ten times for every query create a problem?

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • In light of Peter's solution, I'd have to say "Yeah, you're wrong"   Also, what's wrong with putting "the work in the query"?  Isn't that where most of the SQL work is done?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply