TABLESAMPLE

  • Comments posted to this topic are about the item TABLESAMPLE

  • The tablesample returns a random number every time as the seed used to generate the random number varies marginally everytime, eventhough we say tablesample(10 percent) it will reutrn +1000 or -1000 rows approximatly,

    example : 1. select * from results TABLESAMPLE SYSTEM(100 ROWS)

    2. select * from results TABLESAMPLE SYSTEM(10 PERCENT)

    The number of rows will vary everytime. To get a constant number of rows everytime we can use the keyword REPEATABLE.

  • The tablesample returns a random number every time as the seed used to generate the random number varies marginally everytime, eventhough we say tablesample(10 percent) it will reutrn +1000 or -1000 rows approximatly,

    example : 1. select * from results TABLESAMPLE SYSTEM(100 ROWS)

    2. select * from results TABLESAMPLE SYSTEM(10 PERCENT)

    The number of rows will vary everytime. To get a constant number of rows everytime we can use the keyword REPEATABLE.

    select * from results TABLESAMPLE SYSTEM(10 PERCENT) REPEATABLE(1000).

    Given the same seed, you will get the same rows back. One thing to note here: this is not like the

    repeatable read isolation level. If another user makes changes to the data in the table, you will not get back the exact same rows. It is only true for a given โ€œversionโ€ of the table.

  • just to clarify, it's not that a random number of rows is returned, a random value is generated for each page of the table, and then this is used with the percentage to generate the sample

    From http://technet.microsoft.com/en-us/library/ms189108.aspx

    TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is either included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size that is requested. However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned.

  • I got this right, but the explanation, to me, doesn't really sound right:

    "TABLESAMPLE returns an approximate percentage of rows, even if a number of rows is specified. This is used to get a sample of data from large rows and does not guarentee a number of rows or a random sample"

    From http://technet.microsoft.com/en-us/library/ms189108.aspx, it says TABLESAMPLE returns a sample number of rows from the result set, so if 10 percent is used, then 10 percent is returned. Quote from the link says:

    The TABLESAMPLE clause limits the number of rows returned from a table in the FROM clause to a sample number or PERCENT of rows.

    However, if SYSTEM is used, then if 10 percent is specified, then around 10 percent of the result set is returned. Quote from the link says:

    TABLESAMPLE SYSTEM returns an approximate percentage of rows and generates a random value for each physical 8-KB page in the table.....For example, if you specify TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified data pages of the table.....However, because the random value that is generated for each page is independent of the values that are generated for any other page, a larger, or smaller, percentage of pages than have been requested might be returned.

    Am I reading that right?

  • It doesn't matter whether you specify SYSTEM or not

    From the same link

    SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.

  • kevriley (7/1/2008)


    It doesn't matter whether you specify SYSTEM or not

    From the same link

    SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but this option is the only sampling method available in SQL Server 2005 and is applied by default.

    Oh right ok, so it doesn't matter then if you specify SYSTEM or not.

  • It's likely you would get 10% back, but there's no guarantee.

  • Steve Jones - Editor (7/1/2008)


    It's likely you would get 10% back, but there's no guarantee.

    I tested it more than 10 times and it never returned 10% back approx. It mostly got 476 rows effected or 1454 or 1904 rows.

    But, I doubt, does any body uses this in day to day programming? If no, what's the actual use of this function -- "tablesample system".

    SQL DBA.

  • It's used whenever you need to quickly get a representative sample of data from a large data set for testing purposes.

    It's especially useful when you're developing data warehouses as you can develop and test quickly without having to process a very large database every time you make an amendment to a report or cube.

  • SanjayAttray (7/1/2008)

    ...does any body uses this in day to day programming? If no, what's the actual use of this function -- "tablesample system".

    I would like to know that too.

    One thing I like about forums, and especially this one is the occasional HOLY S**T moment when you realize "Man... I can get some use out of this."

    It's eluding me on this one.

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • I think the ratio of number of rows in the table and the percent specified is also a deciding factor for the number of rows to be returned.

    This is actually used in statistical calculations where we use to collect data randomly to find the growth rate and stuff like that.

  • Please look at the documantation that provided by Micorosoft SQL 2005

    TABLESAMPLE (10 PERCENT) /*Return a sample 10 percent of the rows of the result set. */

    TABLESAMPLE (15 ROWS) /* Return a sample of 15 rows from the result set. */.

    SYSTEM specifies an ANSI SQL implementation-dependent sampling method. Specifying SYSTEM is optional, but it is the only sampling method available in Microsoft SQL Server 2005 and is applied by default.

    TABLESAMPLE SYSTEM returns an approximate percentage of rows. It generates a random value for each physical 8 KB page in the table. Based on the random value for a page and the percentage specified in the query, a page is included in the sample or excluded. Each page that is included returns all rows in the sample result set. For example, when specifying TABLESAMPLE SYSTEM 10 PERCENT, SQL Server returns all the rows on approximately 10 percent of the specified table's data pages. If the rows are evenly distributed on the pages of the table, and if there is a sufficient number of pages in the table, the number of rows returned should approximate the sample size requested. However, as the random value generated for each page is independent of the values generated for any other page, it is possible that a larger, or smaller, percentage of pages than requested are returned. The TOP(n) operator can be used to limit the number of rows to a given maximum

  • I think very few programmers ever used this one. ๐Ÿ™‚

  • i don't know where i could use it but above all good to learn something new. ๐Ÿ™‚

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply