Tablesample

  • (Bob Brown) (9/19/2012)


    Executed the select many times and never got a return. Oh well.

    Try the following script, I'm sure you'll get some (approx. 10-15) non-empty results:

    set nocount on;

    go

    create table #Test (ID int primary key);

    insert into #Test values (2);

    insert into #Test values (1);

    select top 1 ID from #Test tablesample (10 percent);

    if @@rowcount > 0 print 'Catched!';

    drop table #Test;

    go 100

  • vk-kirov (9/19/2012)


    (Bob Brown) (9/19/2012)


    Executed the select many times and never got a return. Oh well.

    Try the following script, I'm sure you'll get some (approx. 10-15) non-empty results:

    set nocount on;

    go

    create table #Test (ID int primary key);

    insert into #Test values (2);

    insert into #Test values (1);

    select top 1 ID from #Test tablesample (10 percent);

    if @@rowcount > 0 print 'Catched!';

    drop table #Test;

    go 100

    Yes, you are right! Thanks.

  • Hugo Kornelis (9/19/2012)


    Nice question. I almost got it wrong, because I had first overlooked the TOP clause.

    However, the official correct answer is still a bit questionable. The result of a TOP without ORDER BY is officially undocumented and undefined. It is true that all combinations of SQL Server version and hardware you and I and others have tested this on always produce the same execution plan, and hence the same result. But from that you cannot infer that this is guaranteed behaviour. For all we know, there may be a critical hotfix being pushed out through Windows Update right now that changes this behaviour.

    </soapbox>

    Hi Hugo,

    When TOP is used in conjunction with the ORDER BY clause, the result set is limited to the first N number of ordered rows; otherwise, it returns the first N number of random rows(http://msdn.microsoft.com/en-us/library/ms189463.aspx).

    In this case the primary key is clustered and it makes the data insertion in order.

    So the following select on a table with clustered index

    select top N ID from #Test tablesample

    returns same data result as

    select top N ID from #Test tablesample

    order by ID

    (#Test table without clustered index)

    Regards,

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • IgorMi (9/19/2012)


    Hugo Kornelis (9/19/2012)


    Nice question. I almost got it wrong, because I had first overlooked the TOP clause.

    However, the official correct answer is still a bit questionable. The result of a TOP without ORDER BY is officially undocumented and undefined. It is true that all combinations of SQL Server version and hardware you and I and others have tested this on always produce the same execution plan, and hence the same result. But from that you cannot infer that this is guaranteed behaviour. For all we know, there may be a critical hotfix being pushed out through Windows Update right now that changes this behaviour.

    </soapbox>

    Hi Hugo,

    When TOP is used in conjunction with the ORDER BY clause, the result set is limited to the first N number of ordered rows; otherwise, it returns the first N number of random rows(http://msdn.microsoft.com/en-us/library/ms189463.aspx).

    In this case the primary key is clustered and it makes the data insertion in order.

    So the following select on a table with clustered index

    select top N ID from #Test tablesample

    returns same data result as

    select top N ID from #Test tablesample

    order by ID

    Regards,

    IgorMi

    Hi Igor,

    Yes, it does return the same result. On any currently available hardware and running any supported version of SQL Server. That is the observed behaviour.

    But it's not the documented behaviour. The page you quoted even gives an explicit warning about this: "otherwise (when no ORDER BY is specified -HK), it returns the first N number of random rows" (emphasis mine). Now I think that wording is not very good, as the choice is not random - but it is unspecified, meaning that any N rows can be returned.

    And that's my point. Yes, today your two queries above will always return the same result. But tomorrow, that can change. The SQL Server dev team always warns about this: any observed behaviour that is not documented my change at the spur of the moment.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Interesting question, I would have thought it would never return results if all the data was on a single page.

  • Thanks for an interesting question. However, this may be the first time I'm satisfied with having got a wrong answer. I had the same issues with the question as did Hugo and Tom, and did not wish to consider the primary key as forcing the sort order. Having said the value could be either 1 or 2, due to the missing ORDER BY, I can live with that. I learned something today, but not something I will use. I'm in the camp of those not willing to use something that just happens to be 100% observable NOW, but is not guaranteed to be so in the future.

    [font="Verdana"]Please don't go. The drones need you. They look up to you.[/font]
    Connect to me on LinkedIn

  • I agree with you. Becasue with 10 Percent it does not make any sense...:;-)

    Best,
    Naseer Ahmad
    SQL Server DBA

  • Hugo Kornelis (9/19/2012)


    IgorMi (9/19/2012)


    Hugo Kornelis (9/19/2012)


    Nice question. I almost got it wrong, because I had first overlooked the TOP clause.

    However, the official correct answer is still a bit questionable. The result of a TOP without ORDER BY is officially undocumented and undefined. It is true that all combinations of SQL Server version and hardware you and I and others have tested this on always produce the same execution plan, and hence the same result. But from that you cannot infer that this is guaranteed behaviour. For all we know, there may be a critical hotfix being pushed out through Windows Update right now that changes this behaviour.

    </soapbox>

    Hi Hugo,

    When TOP is used in conjunction with the ORDER BY clause, the result set is limited to the first N number of ordered rows; otherwise, it returns the first N number of random rows(http://msdn.microsoft.com/en-us/library/ms189463.aspx).

    In this case the primary key is clustered and it makes the data insertion in order.

    So the following select on a table with clustered index

    select top N ID from #Test tablesample

    returns same data result as

    select top N ID from #Test tablesample

    order by ID

    Regards,

    IgorMi

    Hi Igor,

    Yes, it does return the same result. On any currently available hardware and running any supported version of SQL Server. That is the observed behaviour.

    But it's not the documented behaviour. The page you quoted even gives an explicit warning about this: "otherwise (when no ORDER BY is specified -HK), it returns the first N number of random rows" (emphasis mine). Now I think that wording is not very good, as the choice is not random - but it is unspecified, meaning that any N rows can be returned.

    And that's my point. Yes, today your two queries above will always return the same result. But tomorrow, that can change. The SQL Server dev team always warns about this: any observed behaviour that is not documented my change at the spur of the moment.

    Hugo I had the exact same thought when answering this that there were 3 possible answers because there was no ORDER BY. (it can return 1 or 0 rows, it can return 1, it can return 2)

    The order of rows in cases like this is only not guaranteed it is inconsistent. When there is no ORDER BY you are letting the engine get the rows it finds most convenient. When tables get larger and we start getting page splits, index fragmentation and that sort of thing due to deletes the engine will not always return the rows in the order of the primary key because the physical storage is not guaranteed to maintain the order of the data. It is difficult to put together a scenario that will demonstrate the order of rows getting out of synch with the primary key but I have seen it happen.

    Great question about a pretty well unknown feature. It seems the discussion has turned away from the question and onto a totally different topic.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Another QotD which does not test what it purports to test.

    Even if you know TABLESAMPLE inside and out, you cannot expect to get this question correct without knowing the default clustering setting for PRIMARY KEY and the undocumented behaviour of TOP without an ORDER BY.

    Since in this artificial situation, TABLESAMPLE (10 PERCENT) itself will always return either zero or both rows, the determining factor for getting the first answer (always 0 or 1 rows) correct is understanding of TOP, not TABLESAMPLE, and the detemining factor for getting the second answer (value of 1) correct is understanding of PRIMARY KEY and the current undocumented behaviour of TOP.

    So why title the question TABLESAMPLE, or bother to include it ? You're not testing for knowledge of it, that's for sure. Call it "Complex T-SQL Issue" or "Multiple non-deterministic filtering factors".

    This question would be OK, if it weren't for that dishonesty.

  • sknox (9/19/2012)


    Another QotD which does not test what it purports to test.

    Even if you know TABLESAMPLE inside and out, you cannot expect to get this question correct without knowing the default clustering setting for PRIMARY KEY and the undocumented behaviour of TOP without an ORDER BY.

    Since in this artificial situation, TABLESAMPLE (10 PERCENT) itself will always return either zero or both rows, the determining factor for getting the first answer (always 0 or 1 rows) correct is understanding of TOP, not TABLESAMPLE, and the detemining factor for getting the second answer (value of 1) correct is understanding of PRIMARY KEY and the current undocumented behaviour of TOP.

    So why title the question TABLESAMPLE, or bother to include it ? You're not testing for knowledge of it, that's for sure. Call it "Complex T-SQL Issue" or "Multiple non-deterministic filtering factors".

    This question would be OK, if it weren't for that dishonesty.

    Whoa, pretty harsh to call this question dishonest. In reading your rant it sounds like if the author had included the ORDER BY it would have been a question that would pass your standards? Yes the missing order by raises some ambiguity but by no means was the author trying to be dishonest or deceive you.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I would say also that it was dishonest, or at a minimum deceptive. In any case it shows that you can not always trust certain functions of the SQL Server platform to perform the same way, or that it is potentially not dependable in all cases.

    Kind of erodes the confidence in certain uses of the product.

    Not all gray hairs are Dinosaurs!

  • Miles Neale (9/19/2012)


    I would say also that it was dishonest, or at a minimum deceptive. In any case it shows that you can not always trust certain functions of the SQL Server platform to perform the same way, or that it is potentially not dependable in all cases.

    Kind of erodes the confidence in certain uses of the product.

    It is well known and discussed on all the sql forums/boards. If you want your data in a certain there is one and only one way to make sure that happens, add an ORDER BY. The reality is that relying on the engine to do something like that is sloppy and lazy anyway. The product will return the rows in the order that it can get them the fastest UNLESS you tell it to return them in a certain order. The sql engine does ALWAYS do this the same way (gets them in the order it can do it the quickest). It is just that the way the engine does it and the way you want it done are not always the same. That is why we have an ORDER BY. It is NOT a fault of the engine but a fault of the user to assume the order of the data.

    As for the deception, I doubt the author was intending to be dishonest or deceptive. Maybe it is my eternal optimism but I have encountered very few cases of somebody trying to make a question deceptive. I am guessing it was an oversight.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Good question. I don't use tablesample so its behavior was new to me.

  • Miles Neale (9/19/2012)


    Kind of erodes the confidence in certain uses of the product.

    Huh? Why?

    SELECT TOP ... without ORDER BY is like saying "give me the first by any order of your choice" - and now you say that because that order might be different than you expected (but didn't specify), it is eroding your confidence in the product?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Nils Gustav Stråbø (9/19/2012)


    Bummer. Missed the TOP 1.

    Always. Read. The. Full. Question.

    +1

    sknox (9/19/2012)


    Another QotD which does not test what it purports to test.

    Even if you know TABLESAMPLE inside and out, you cannot expect to get this question correct without knowing the default clustering setting for PRIMARY KEY and the undocumented behavior of TOP without an ORDER BY.

    Since in this artificial situation, TABLESAMPLE (10 PERCENT) itself will always return either zero or both rows, the determining factor for getting the first answer (always 0 or 1 rows) correct is understanding of TOP, not TABLESAMPLE, and the determining factor for getting the second answer (value of 1) correct is understanding of PRIMARY KEY and the current undocumented behavior of TOP.

    So why title the question TABLESAMPLE, or bother to include it ? You're not testing for knowledge of it, that's for sure. Call it "Complex T-SQL Issue" or "Multiple non-deterministic filtering factors".

    This question would be OK, if it weren't for that dishonesty.

    It isn't dishonesty. The question does require an understanding of both TOP & PRIMARY KEY as well as TABLESAMPLE. That doesn't make the title dishonest. That makes it a TITLE. The person who wrote the question had to come up with a title for the question. It isn't actually part of the question. I'm guessing (and it's only a guess) that the writer felt that TABLESAMPLE was a less known and therefore bigger part of the question.

    If you enjoy doing the QotD and don't feel the questions are up to your standards then submit a few. But if you do make sure you have a fairly thick skin because someone will complain about something. Always. (Unless they don't :-)) Sometimes the comments made are valid, frequently they are informational, and sometimes they are just complaints.

    Next time you make a comment remember that the person who wrote the question will probably read it. If you have a compliant you feel is valid, try to make it constructive criticism. Personally I always enjoy reading those. Try not to come off as insulting. Try "I was thrown by the title. It didn't seem to match up with everything in the question. Here is my reasoning." Calling the writer dishonest is rather harsh and he may choose not to write any more questions, or at least less of them, and then all of us lose out.

    Sorry for the rant but I've done a few questions of my own recently and enjoyed several pages of discussion over the difference between 'True/False' and 'Yes/No'. Some of which was well thought out and constructive, and as a result I'm going to be more careful in the future. Some of the comments were just insulting.

    Kenneth FisherI was once offered a wizards hat but it got in the way of my dunce cap.--------------------------------------------------------------------------------For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/[/url]Link to my Blog Post --> www.SQLStudies.com[/url]

Viewing 15 posts - 16 through 30 (of 47 total)

You must be logged in to reply to this topic. Login to reply