TABLESAMPLE output

  • Interesting question

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • david.gugg (4/3/2014)


    Great question, I'd never heard of TABLESAMPLE, so I looked it up - and then still ended up getting the answer wrong :O

    +1 Thanks for the question.



    Everything is awesome!

  • twin.devil (4/3/2014)


    Nice question Anoo. learn something new today. thanks

    +1

  • pmadhavapeddi22 (4/3/2014)


    when we are specifying high range of rows nearing to the exact no of rows then it will return all the rows of the table

    like , in the QOD

    we have total no of rows 5 and tablesample is given for 4 rows, then that particular statement will return 5 rows.

    When I ran it for the 5th or 6th time it returned zero rows.

    [font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
    Connect to me on LinkedIn

  • Wow, I didn't know that.

    Thanks for the question.

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Good question, and good explanation. Thanks, Anoo! 😉


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • pmadhavapeddi22 (4/3/2014)


    when we are specifying high range of rows nearing to the exact no of rows then it will return all the rows of the table

    like , in the QOD

    we have total no of rows 5 and tablesample is given for 4 rows, then that particular statement will return 5 rows.

    No, it will decide whether the first page is selected based on some statistical calculation, which is NOT deterministic. If teh first page is selected in our case the answer is 5, because the first page contaons the whole table. If that page is not selected, the answer is 0 because there are no ther pages to contain rows.

    Tom

  • thanks tom for the detailed explanation

  • But i really dont understand one thing, tablesample is expecting 4 rows to return, but the result set is totally different, that too on different execution, the purpose of the tablesample is confusing, is there anything specific about this ?

    Thanks in advance

  • pmadhavapeddi22 (4/3/2014)


    But i really dont understand one thing, tablesample is expecting 4 rows to return, but the result set is totally different, that too on different execution, the purpose of the tablesample is confusing, is there anything specific about this ?

    Thanks in advance

    The TABLESAMPLE clause is supposed to be used to get a "somewhat random" sample of the table contents. There is no guarantee that the amount or percentage of rows returned will be exactly as speceified.

    The internal implementation is that each page allocated to the table is either included or not included. For large tables, this works out okay (allthough the results will always consist of series of consecutive rows, with gapes in between those series - as I said: "somewhat random" at best). But for smaller tables, and especially as small as the one in this question, it just acts weird.

    The table in the question fits completely on a single page. Because TABLESAMPLE always works at the page level, the effect is that the page either will or will not be included in the sample.

    I hope this helps!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Just tosupplement what Hugo wote: the reason for working at page level insteadof at row level is performance. Think of a moderate sized table, say a million rows spread over twenty thousand pages. Calculating for each row whether it is to be included or not would mean getting a million random numbers (one for each row) and testing whether each is over some threshold to determine whether teh correcponding row is to retrieved. It's cheaper to do that operation only twenty thousand times, rather than amillion times. If the required sample size is say 10% of the table, then the row-wise method will collect on average 5 rows from each page and just about all 20000 pages will be read because the chance of a page not being read is, on average, 0.9^50 which is a lot less that 1/20000; but the page-oriented version will on average read only 400 pages - a factor of 50 reduction in IO and buffer requirements.

    As Hugo says, the paging means that the samples delivered by this feature are somewhat less randomly chosen than the would be with a row-oriented method, and there can be cases where that as bad effects on teh validity of what you get from using this for testing - it's worth looking at whether this will be significant or not (but difficult to work this out if the table is a heap ).

    Tom

  • Ed Wagner (4/3/2014)


    I'd never used this option either. Very interesting, so thanks.

    Me neither, but very interesting QotD!

    Thanks & Best Regards,
    Hany Helmy
    SQL Server Database Consultant

  • Thank you Hugo Kornelis and TomThomson for the explanation

  • Thanks for the interesting question. I ran query several times on SQL 2012; output was 0 or 5 rows, but once 4 rows were returned.

  • nice question thanks Anoo.

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply