Tablesample

  • Comments posted to this topic are about the item Tablesample



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Got into the trap...thought query may return 1 or 2....overlooked the primary key factor :w00t:

    A good question though. Thanks for posting.

    ~ Lokesh Vij


    Guidelines for quicker answers on T-SQL question[/url]
    Guidelines for answers on Performance questions

    Link to my Blog Post --> www.SQLPathy.com[/url]

    Follow me @Twitter

  • Percentage is very low for that sample data, it will likely never return any row. Change the query like that to be fair:

    create table #Test (ID int primary key);

    insert into #Test values (2);

    insert into #Test values (1);

    select top 1 ID from #Test tablesample (50 percent)

    drop table #Test

  • Awesome. 🙂 thank you for the interesting question

    Got this wrong too, well I have seen this feature but never used. I need to work on this a lot now to make my understanding consistent.

    (I was in the feeling that this might work like TOP [PERCENT] style, but seems like this feature is quite different OR the way it works is different)

    Just a thought - "the query always returns no rows" can also be correct I guess. This is happening due to the limited rows and so no matter when you execute this query, it will not show any rows (with respect to the given data and the select statement) until the data is been increased with proper amount which is not going to happen. (I am wrong here is so many ways...:w00t:)

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Interesting question.

    Thanks for sharing! 🙂

  • Evgeny Garaev (9/18/2012)


    Percentage is very low for that sample data, it will likely never return any row. Change the query like that to be fair:

    create table #Test (ID int primary key);

    insert into #Test values (2);

    insert into #Test values (1);

    select top 1 ID from #Test tablesample (50 percent)

    drop table #Test

    Is this just me or somethings going wrong.....

    If I execute the above statement as a whole I can see one record in the result, but executing only the select statement it is not giving any results... :unsure:

    EDIT:

    I have answered myself.... the result is uncommon the resultset may vary like mentioned in the BOL. :blink:

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Evgeny Garaev (9/18/2012)


    Percentage is very low for that sample data, it will likely never return any row. Change the query like that to be fair:

    create table #Test (ID int primary key);

    insert into #Test values (2);

    insert into #Test values (1);

    select top 1 ID from #Test tablesample (50 percent)

    drop table #Test

    I could have used 50 percent, but the correct answers would be the same. Now the query will return a single row approximately 10% of the times the code is run.



    Ole Kristian Velstadbråten Bangås - Virinco - Facebook - Twitter

    Concatenating Row Values in Transact-SQL[/url]

  • Nice question. I almost got it wrong, because I had first overlooked the TOP clause.

    However, the official correct answer is still a bit questionable. The result of a TOP without ORDER BY is officially undocumented and undefined. It is true that all combinations of SQL Server version and hardware you and I and others have tested this on always produce the same execution plan, and hence the same result. But from that you cannot infer that this is guaranteed behaviour. For all we know, there may be a critical hotfix being pushed out through Windows Update right now that changes this behaviour.

    I know this may seem nitpickish, as this would be an extremely unlikely change, but I think it's really important for everyone to know that they should never rely on undocumented behaviour, no matter how repeatable and how safe it appears. People have been bitten by trusting undocumented behaviour (*), and I'm afraid that this will continue.

    (*) The two best examples of this that I recall are:

    (1) The introduction of new queryplan operators for grouping, I think in SQL 7.0. Before that, sorting was the only option for the optimizer, and many people wrote their group by queries without order by clause. Big surprise for them when, after upgrade, the optimizer suddenly chose to do a hash group by!

    (2) The optimizer change that caused "TOP 100 PERCENT" to be ignored (wasn't that in SQL 2005?), breaking code for people who had added that and an ORDER BY clause to their view, thinking that they now didn't need to have an ORDER BY in their queries anymore. And to my utter amazement, people who ahve been bitten by this still insist on "fixing" this with just a different undocumented "trick". And to my even utterer (is that a word?) amazement, Microsoft's own tool -the view designer in SSMS!- still generates this non-working clause, even in SQL Server 2012 (though it now does pop up a warning message when you try to save the view, a slight improvement over 2008R2).

    </soapbox>


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Bummer. Missed the TOP 1.

    Always. Read. The. Full. Question.

  • Hugo Kornelis (9/19/2012)


    Nice question. I almost got it wrong, because I had first overlooked the TOP clause.

    However, the official correct answer is still a bit questionable. The result of a TOP without ORDER BY is officially undocumented and undefined...

    Hello Hugo,

    (bit confused) Just trying to understand... "the official correct answer" with respect to the behavior of the TOP with TABLESAMPLE ? or the output (which have to be consistent) of the query?

    The Person.Person table contains 19,972 rows.

    The following statement returns approximately 10 percent of the rows.

    The number of rows returned usually changes every time that the statement is executed.

    As the BOL states that the output usually changes, so to my understanding the TOP will work upon the resultset of the TABLESAMPLE returns just to limit the top 'n' rows to display on the final output as it uses the primary key constraint so ORDER BY might not be considered here?

    Thank you for your contribution (there is stuff always to learn):-)

    ww; Raghu
    --
    The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

  • Raghavendra Mudugal (9/19/2012)


    to my understanding the TOP (...) uses the primary key constraint so ORDER BY might not be considered here?

    Sorry for the mangled quote, but I did that so that I could replly: that is the bit I was referring to.

    The only thing documented about TOP without ORDER BY is that it will return at most the specified number of rows. There is nothing that says it will use the primary key. To be more precies:

    1. In the case of the code in this specific QotD, the execution plan happens to use the clustered index - but there is no guarantee of that.

    2. This specific QotD relies on the rows within a single clustered index page being processed in their logical order - for an unordered clustered index scan (as used in the execution plan of this query), that behaviour is not documented or guaranteed either. (If you check the actual contents of the page with DBCC PAGE, you'll see that due to the order of the INSERT statements, the actual order of the data in the page is in fact different from the logical order imposed by the clustered index).

    3. Finally, the query optimizer, is free to process TOP in whatever way it wants to. It may make little sense to process two rows and return the second when a TOP(1) is present, but it would not be an incorrect result.

    It is not very likely that the behaviour described in the first or second point will change, but it COULD happen. For the third point, a behaviour change is a mere theoretic possibility; I don't expect that to ever change (but I'm sure there are people who said the same about pre-7.0 GROUP BY behaviour).

    Anyway, my point is that one should make a clear distinction between behaviour thay may be observed 100% of the time but is still undocumented and unguaranteed; and behaviour that follows from documentation. The first can always change and should hence never be relied upon..


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • This was removed by the editor as SPAM

  • I got it wrong. I looked at it, and realised that there are in theory 3 correct answers there: the table will always return no more than 1 row, the query may return value 1, and the query may return value 2 (top without an order clause isn't constrained to use the clustered index order). So I had to pick 2 out of the 3 poeeibilities, and picked the two values because I thought maybe that was the point of the question and "at most one row" had got in there by mistake. Of course I don't thing that picking value 1 and value 2 is right, but neither is picking at most one row and value 1 (the official "correct" answer that reinforces a potentially dangerous misconception about some imaginary inherent ordering) and neither of course is picking at most one row and value 2 (because that would depend on the same fallacious concpt of an inherent order).

    So maybe a good question (it's on a topic not seen before in QotD, I think, and that at least is good) - but maybe not (reinforcing the inherent order myth is NOT good). I haven't got at SQL 2012 yet, maybe the optimiser there can pick a different order? Or the next release, or the one after that - the SQL community has been burnt before by assuming rules which don't exist (as pointed out by Hugo).

    The row in one of our forums about using quirky update (where the problem is in fact the same inherent order assumption) makes it clear that people expect that there's an unacceptably high risk that a new release will blow that myth out of the water. Even though Jeff wrapped up his use of quirky update with conditions designed to ensure that there was a high probability of the mythical inherent order being used by the optimiser, and insisted that everything must be carefully tested and checked in the real environment and not just in testbeds, and Paul proposed some checks which were fairly certain to ensure that there was an extremely high probability that if the optimiser chose a different order an error would be signalled, and I refined those checks to bring that probability to 100% a lot of people still claim that it is catastrophically dangerous to assume that order when quirky update was used; surely it's just as dangerous to assume that order when TOP is used without an order by clause - or even more dangerous, since no one has suggsted any checks to detect the order not being used and signal it when it happens.

    edit: English & spelling

    Tom

  • Nice question, thanks.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Executed the select many times and never got a return. Oh well.

Viewing 15 posts - 1 through 15 (of 47 total)

You must be logged in to reply to this topic. Login to reply