Shorten query by using Top predicate

  • I have situations when I only want to know if I have one record, or more than one record. For instance, a query like this:SELECT Top 2 AkcesAutoID FROM Podrobnosti Group By AkcesAutoIDI would like to stop as soon as it sees that it has a second record. But when I look at the query plan, the first step, an Index Scan passes (in this case) 85 rows, from an estimated 32,376 rows to the Stream Aggregate function, which then passes only two to the Top function, which then passes only two to the Select function. This is a simple query, but real ones will be considerably more complex, with multiple joins and conditionals.

    The gist is that I want to know whether all the records matching my criteria have the same value in a particular field, in this case AkcesAutoID. If they are all the same, I can deal with the result set in a particular (simplified) way. If there are more, I don't (at this point) care how many more, I already know that the app has to do something a different way.

    Reading on the subject indicates that the Top predicate is always applied only to the result set. Is there any way to tell the query engine that I want it to stop working on the problem AS SOON AS it has the second record that meets my criteria, instead of gathering them all and then discarding all but two?

  • No that is how it works. If you use the TOP predicate you generally should also use an order by clause. Without the order by there is no way of knowing which row(s) you will have returned.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (8/22/2013)


    No that is how it works. If you use the TOP predicate you generally should also use an order by clause. Without the order by there is no way of knowing which row(s) you will have returned.

    That's my point, though. Did you read the details of my original post at all? In this situation, I DON'T CARE which rows are being returned, and in fact, I don't even want the rows. All I want is to know whether there is one row, or more than one row, and I want the engine to stop working on the problem as soon as it can answer that question, i.e, as soon as it sees a second row. EVERYTHING it does after discovering that there is another row besides the first one is completely wasted effort. Adding an Order By clause would simply result in even more pointless work.

    I understand that the Top predicate is not the way to restrict the activity, but I'm wondering if there IS some way to do so.

  • Maybe you would be better off using:

    SELECT

    CASE (SELECT COUNT(1) FROM table WHERE...)

    WHEN 0 THEN 'none'

    WHEN 1 THEN 'single'

    ELSE 'multiple'

    END

    and then using your simplified or complex query depending on the results.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • pdanes (8/22/2013)


    Sean Lange (8/22/2013)


    No that is how it works. If you use the TOP predicate you generally should also use an order by clause. Without the order by there is no way of knowing which row(s) you will have returned.

    That's my point, though. Did you read the details of my original post at all? In this situation, I DON'T CARE which rows are being returned, and in fact, I don't even want the rows. All I want is to know whether there is one row, or more than one row, and I want the engine to stop working on the problem as soon as it can answer that question, i.e, as soon as it sees a second row. EVERYTHING it does after discovering that there is another row besides the first one is completely wasted effort. Adding an Order By clause would simply result in even more pointless work.

    I understand that the Top predicate is not the way to restrict the activity, but I'm wondering if there IS some way to do so.

    Are you always this snarky to people who are honestly trying to help? I tried to explain how TOP works. Maybe TOP is not the most efficient way to return the results you are looking for. I can't tell because your post was so vague. I hope you figure out a way to get what you after.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Obviously there's no way to avoid a complete scan when all the values are the same.

    It's possible to have something which takes quadratic time when all the values are the same, but it does stop as soon as a second value is hit:-

    select case

    when not exists (select 1 from MyTable A, MyTable B where A.MyField <> B.MyField)

    then 1

    else 0

    end as singleValue -- 1 = true, 0 = false

    Of course the performance cost in the single value case is large; but so is the cost of the sort to do your group by operation, so this may be better.

    Tom

  • Why not use EXISTS and something like t1.FK<>t2.FK to check for the existence of two records?

    By definition, EXISTS stops as soon as it evaluates to True.

    Sean's right - honey beats vinegar every time.

  • L' Eomot Inversé (8/22/2013)


    Obviously there's no way to avoid a complete scan when all the values are the same.

    It's possible to have something which takes quadratic time when all the values are the same, but it does stop as soon as a second value is hit:-

    select case

    when not exists (select 1 from MyTable A, MyTable B where A.MyField <> B.MyField)

    then 1

    else 0

    end as singleValue -- 1 = true, 0 = false

    Of course the performance cost in the single value case is large; but so is the cost of the sort to do your group by operation, so this may be better.

    Yes, when all values are the same, of course, there is no way to know that until all values have been examined. This looks promising - thank you. I don't think I ever would have thought of this on my own. Just tried it, and the number of rows passed from Index Scan B is 60 and Index Sacn A is 1, so it's already better than the 85 I had initially. Some pointless activity by the engine is probably unavoidable, but this seems to be the way to go.

  • Sean Lange (8/22/2013)


    Are you always this snarky to people who are honestly trying to help? I tried to explain how TOP works. Maybe TOP is not the most efficient way to return the results you are looking for. I can't tell because your post was so vague. I hope you figure out a way to get what you after.

    What snarky? Because I asked if you read the details of my original post? Sorry, but your answer gave the impression that you hadn't.

    My post stated I had read that Top worked on only result sets, and I asked if there was a way (another way) to curtail processing. You didn't explain anything, just told me that that's how it works, which I had stated that I already knew.

    I made a point of stating that I'm trying to REDUCE processing time, and that I am only interested in knowing whether or not I have more than one record, and I want to STOP processing as soon as I discover that. You responded by telling me to add a sort to a dataset which I had already stated I was trying to avoid generating in the first place.

    Apologies if I sounded curt, but I do sometimes get answers from which it's completely clear that the responder in fact DIDN'T read the post, and is only reacting to a few keywords that he noticed while giving my post a cursory glance. The most common example is "Post your SQL", when I state right in the original question that I have no SQL yet, that I'm asking a question about design principles.

  • pdanes (8/22/2013)


    Sean Lange (8/22/2013)


    Are you always this snarky to people who are honestly trying to help? I tried to explain how TOP works. Maybe TOP is not the most efficient way to return the results you are looking for. I can't tell because your post was so vague. I hope you figure out a way to get what you after.

    What snarky? Because I asked if you read the details of my original post? Sorry, but your answer gave the impression that you hadn't.

    My post stated I had read that Top worked on only result sets, and I asked if there was a way (another way) to curtail processing. You didn't explain anything, just told me that that's how it works, which I had stated that I already knew.

    I made a point of stating that I'm trying to REDUCE processing time, and that I am only interested in knowing whether or not I have more than one record, and I want to STOP processing as soon as I discover that. You responded by telling me to add a sort to a dataset which I had already stated I was trying to avoid generating in the first place.

    Apologies if I sounded curt, but I do sometimes get answers from which it's completely clear that the responder in fact DIDN'T read the post, and is only reacting to a few keywords that he noticed while giving my post a cursory glance. The most common example is "Post your SQL", when I state right in the original question that I have no SQL yet, that I'm asking a question about design principles.

    We will just have to chalk it up to a miscommunication. After seeing the details in the couple following posts I would agree that using EXISTS is likely a better alternative for what you are trying to accomplish.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply