How do you limit records using Distinct and Count - TOP and Rowcount dont work ?

  • I wouldn't waste time on clowns that are obviously not professionals. This site houses professionals helping other professionals. There are professionals with varying levels of experience and then there's the occasional twit that most likely will not stick around. It does take time to provide the necessary scripts and desired results however it is time well spent. There are a lot of people like myself who love the challenge but don't have the time to back pedal.

  • jcrawf02 (1/14/2010)


    So Gail, something that I wonder occasionally (the rest of the time I wonder about everything else :-P) - if I SELECT TOP 10 and ORDER BY the clustered index, can the optimizer figure out that it just needs to look at the first ten records in the index and work with those? Or is it always the case that it will process the query on the dataset and THEN apply the TOP filter?

    Depends on the complexity of the query.

    SELECT TOP 10 * FROM SomeTable

    This so simple that the top can be pushed right down. SQL will only fetch 10 rows from the table. It knows it doesn't need more.

    SELECT Top 10 * FROM SomeTable Where AnIndexedColumn = @somevar

    Same here. Simple enough to be an index seek that returns only 10 records

    SELECT Top 10 SomeColumn

    FROM ATable

    WHERE AnotherColumn = @var

    Group by SomeColumn

    Having Count(*) > 10

    Order By SomeColumn

    Now, for this one, how many rows are needed from the base table to return 10 at the end? Very difficult to say so, in this case it's likely that all qualifying rows will be fetched, aggregated, filtered again and then the top applied.

    The optimiser will try really hard to push the row limitation down as far as possible, and I've seen it do things I wouldn't have expected (including pushing a row limitation down to an index seek when I was filtering on the output of Row_Number

    I think this is worthy of a blog post, it's complex.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • david.tyler (1/14/2010)


    I wouldn't waste time on clowns that are obviously not professionals. This site houses professionals helping other professionals. There are professionals with varying levels of experience and then there's the occasional twit that most likely will not stick around. It does take time to provide the necessary scripts and desired results however it is time well spent. There are a lot of people like myself who love the challenge but don't have the time to back pedal.

    If you are so 'professional', why do you resort to name calling?

  • Ifila, thanks for posting the information that was being requested. This topic has shown that SSC can help and will help when challenged. I hope the results of the problem being resolved will keep you involved at SSC.

    It is good that your issue is resolved.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • GilaMonster (1/14/2010)

    I think this is worthy of a blog post, it's complex.

    woohoo! 😀

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • After 25 years of dealing with IT people, I guess sometimes one can get cantankerous.

    Thanks to all the constructive contributors!

  • ifila (1/14/2010)


    david.tyler (1/14/2010)


    I wouldn't waste time on clowns that are obviously not professionals. This site houses professionals helping other professionals. There are professionals with varying levels of experience and then there's the occasional twit that most likely will not stick around. It does take time to provide the necessary scripts and desired results however it is time well spent. There are a lot of people like myself who love the challenge but don't have the time to back pedal.

    If you are so 'professional', why do you resort to name calling?

    Careful now, you didn't start out so well either but we persevered in an attempt to help you solve your partiular issue, even if some of us may have gotten a bit sarcastic at times.

    Bottom line, we took a challange and developed a solution and provided a recommendation on indexes as well. And, it didn't cost you any $$.

  • GilaMonster (1/14/2010)I think this is worthy of a blog post, it's complex.

    In the meantime, anyone interested in how TOP and OPTION (FAST n) affect the choice of plan made by the optimizer could look at the following two blogs by the SQL Server Query Optimization Team:

    Row Goals In Action

    FAST hint guidance

    Paul

  • Thanks Paul.

    Gail - can't wait to see that Blog post.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • ifila (1/14/2010)


    After 25 years of dealing with IT people, I guess sometimes one can get cantankerous.

    Thanks to all the constructive contributors!

    Guilty of cantankerism as charged - but hey, it didn't work out so bad in the end.

    Ifila, have you considered putting the row restriction on the innermost part of the query? Using TOP 1000 will give you the top 1000 resumes which will of course result in perhaps twice as many email addresses, but it does mean that the slowest, most expensive part of the query has a limiter on it. You could use a variable for your limit value and apply it to both the innermost query, to improve performance, and to your outermost i.e. the pivot, to return a consistent rowcount.

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris-439714 (1/15/2010)


    ifila (1/14/2010)


    After 25 years of dealing with IT people, I guess sometimes one can get cantankerous.

    Thanks to all the constructive contributors!

    Guilty of cantankerism as charged - but hey, it didn't work out so bad in the end.

    Ifila, have you considered putting the row restriction on the innermost part of the query? Using TOP 1000 will give you the top 1000 resumes which will of course result in perhaps twice as many email addresses, but it does mean that the slowest, most expensive part of the query has a limiter on it. You could use a variable for your limit value and apply it to both the innermost query, to improve performance, and to your outermost i.e. the pivot, to return a consistent rowcount.

    Cheers

    ChrisM

    Thanks for the suggestion, but i dont understand what you mean:unsure:

  • This bit...

    SELECT Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3

    WHERE ((contains (originalresume, '"j2ee" and "java"'))

    AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))

    change it to

    SELECT top 1000 Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3

    WHERE ((contains (originalresume, '"j2ee" and "java"'))

    AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))

    You will want to put an ORDER BY on that. If you want it random, you could use ORDER BY NEWID(), but you might want it ordered by recency.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris Morris-439714 (1/15/2010)


    This bit...

    SELECT Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3

    WHERE ((contains (originalresume, '"j2ee" and "java"'))

    AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))

    change it to

    SELECT top 1000 Email1,Email2,Email3, Email4,Email5,Email6,Email7,Email8,Email9,Email10 FROM outputresume3

    WHERE ((contains (originalresume, '"j2ee" and "java"'))

    AND (currentdateout BETWEEN '2000-01-01' AND '2010-01-06'))

    You will want to put an ORDER BY on that. If you want it random, you could use ORDER BY NEWID(), but you might want it ordered by recency.

    Excluding the 'Order By' that is the way i had it working before my post on SSC.

    Thanks for the suggestion!

  • Yeah but...kinda.

    This time round, you know exactly why you would put it there, and why you will (almost) always get more results in your final output than you specified in the row limit. You know that you can select the first 1000 matching resumes, most recent first i.e. ordered by recency, then extract out all of the email addresses from them, then isolate 1000 email addresses.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 14 posts - 91 through 103 (of 103 total)

You must be logged in to reply to this topic. Login to reply