Top N plus

  • the query that produces our 22K row ranked result set takes 150s. i was looking for some top N wizardry to cajole the query engine to skip producing most of those rows but still produce a rank. that's not possible.

  • My suggested algorithm fetches only the top 10 records based on the column that ranking is calculated on.

    I then calculate the ranking for only these 10 records.

    Then I eventually fetch 1 more record if not already present among the top 10 and calculate the ranking for that single record only.

    All this averages at 80 milliseconds for 100000 sample records.

    Is this not what you want?

    CREATE PROCEDURE dbo.uspGetMyModels

    (

    @Top TINYINT,

    @Model VARCHAR(20)

    )

    AS

    SET NOCOUNT ON

    -- Stage query result

    DECLARE @Stage TABLE

    (

    Model VARCHAR(20),

    Metric INT,

    Ranking INT

    )

    -- Populate stagin table

    INSERT@Stage

    (

    Model,

    Metric

    )

    SELECTTOP (@Top)

    Model,

    Metric1

    FROM{YourTableNameHere}

    ORDER BYMetric1 DESC

    -- Update ranking for top n records

    UPDATEs

    SETs.Ranking = 1 + (SELECT COUNT(*) FROM @Stage AS x WHERE x.Metric > s.Metric)

    FROM@Stage AS s

    -- If user supplied model is not among the top n records, add it

    IF NOT EXISTS (SELECT * FROM @Stage WHERE Model = @Model)

    BEGIN

    INSERT@Stage

    (

    Model,

    Metric

    )

    SELECTModel,

    Metric1

    FROM{YourTableNameHere}

    WHEREModel = @Model

    UPDATEs

    SETs.Ranking = 1 + (SELECT COUNT(*) FROM {YourTableNameHere} AS x WHERE x.Metric1 > s.Metric)

    FROM@Stage AS s

    WHEREs.Model = @Model

    END

    -- Show the expected result

    SELECTModel,

    Metric

    FROM@Stage


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso (2/5/2008)


    My suggested algorithm fetches only the top 10 records based on the column that ranking is calculated on.

    I then calculate the ranking for only these 10 records.

    Then I eventually fetch 1 more record if not already present among the top 10 and calculate the ranking for that single record only.

    Sounds about like the same idea that I put out there, both of which would at face value should be quite a bit less work than the "plain RANK()" solution. At this point - sounds like you might have some things to test - but you seem unwilling to try them out. Is there something we're not catching, or that we haven't been told?

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Peso (2/5/2008)


    My suggested algorithm fetches only the top 10 records based on the column that ranking is calculated on.

    I then calculate the ranking for only these 10 records.

    Then I eventually fetch 1 more record if not already present among the top 10 and calculate the ranking for that single record only.

    quote]

    Peso, your technique is still dealing with a 22K row result set that took 150s to produce. Any technique that focuses on getting data out of that result set faster is pointless since it only takes 70ms to extract the handful of rows we want. We were looking for a way to somehow cut down that 22K row result before it's even returned and still be able to calculate a proper rank.

    Like Matt said, TOP (N) WITH TIES has big performance benefits compared to TOP (N). We also know that TOP (N) is doing something more than just the equivalent of SET ROWCOUNT N. So we were hoping that it (or something similar) could be applied in the WHERE clause rather than as post-processing.

  • antonio.collins (2/5/2008)


    Peso, your technique is still dealing with a 22K row result set that took 150s to produce. Any technique that focuses on getting data out of that result set faster is pointless since it only takes 70ms to extract the handful of rows we want. We were looking for a way to somehow cut down that 22K row result before it's even returned and still be able to calculate a proper rank.

    Like Matt said, TOP (N) WITH TIES has big performance benefits compared to TOP (N). We also know that TOP (N) is doing something more than just the equivalent of SET ROWCOUNT N. So we were hoping that it (or something similar) could be applied in the WHERE clause rather than as post-processing.

    Do you even bother to test my suggestions? I am NOT dealing with 22k records.

    Are you using TOP 22000 in your query? I am not...

    I fetch TOP 10 records ORDER BY Metric DESC.

    For this 10 records, I rank them individually. Not 22k records.

    Are you stuck with my first suggestion? I have posted other methods as well. Have you tried them?

    I even posted a complete STORED PROCEDURE for you to test with!

    Please test that stored procedure named "dbo.uspGetMyModels" and post back the issues you have with the algorithm and I will guide you.

    If the user supplied model is not found among the 10 records I add that one too, and calculate the Ranking for this record only, not 22k.

    When I added 100000 sample records to test with, the procedure posted above took an average of 80 milliseconds to finish!

    How many sample records do you want me to test with?


    N 56°04'39.16"
    E 12°55'05.25"

  • antonio.collins (2/5/2008)


    the query that produces our 22K row ranked result set takes 150s. i was looking for some top N wizardry to cajole the query engine to skip producing most of those rows but still produce a rank. that's not possible.

    What query is that? Is that the query that calculates the metric values? And later is used for pulling the TOP 10 records out?

    We haven't seen another query than the request for a query that should pull TOP 10 records out and possible an 11th one.


    N 56°04'39.16"
    E 12°55'05.25"

  • antonio.collins (2/5/2008)


    the query that produces our 22K row ranked result set takes 150s. i was looking for some top N wizardry to cajole the query engine to skip producing most of those rows but still produce a rank. that's not possible.

    And in our own ways - both Peso and I are telling you you do not need to touch all 22K records at all in most cases. In my case, not unless the "camry" result you wish happens to be the dead last result in the ranking of the entire candidate set. You should see large gains in performance by cutting your candidate set into the top 20, and then figuring out just the "camry" ranking (which in my case involves scanning rows from #1 to "camry" in the order of the metric to use).

    I can't quite tell if you caught that or tested either solution. In my script's scenario, AT WORSE, you end up with something close to processing the whole set, but in most cases, it will be a LOT faster than the "straight" rank(). Peso took on trying to build his own RANK() functionality, which might very well be faster than the "real" one, though I haven't had a chance to test (not quite sure it returns the same numbers as RANK()). Again - it's a matter of testing

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Peso (2/5/2008)


    antonio.collins (2/5/2008)


    the query that produces our 22K row ranked result set takes 150s. i was looking for some top N wizardry to cajole the query engine to skip producing most of those rows but still produce a rank. that's not possible.

    What query is that? Is that the query that calculates the metric values? And later is used for pulling the TOP 10 records out?

    We haven't seen another query than the request for a query that should pull TOP 10 records out and possible an 11th one.

    Peso, this is part of your code:

    INSERT @Stage ( Model, Metric1 )

    SELECT TOP (@Top) Model, Metric1

    FROM{YourTableNameHere} ORDER BY Metric1 DESC

    {YourTableNameHere} is not a table, it is the result of a query that runs in 150s. Your technique is post-processing that entire result set after it has completed. We were looking for a TOP N'ish technique or strategy that could reduce the volume of data that query considers and returns since out of 22K records produced we typically only need a handful.

    Thanks for your input, but this is a dead topic.

  • antonio.collins (2/5/2008)


    {YourTableNameHere} is not a table, it is the result of a query that runs in 150s. Your technique is post-processing that entire result set after it has completed. We were looking for a TOP N'ish technique or strategy that could reduce the volume of data that query considers and returns since out of 22K records produced we typically only need a handful.

    Thanks for your input, but this is a dead topic.

    Derailed maybe, but not dead.

    How about YOU post the query used and today is taking 150s to run? If you don't post the real problem to us, how can you expect us to assist you?

    Maybe we can spot something in that query and make that run faster?


    N 56°04'39.16"
    E 12°55'05.25"

  • for the last time: the 150s query is not a problem! it is not optimized (it's actually our day/week query) but it already runs faster than the quarterly reports that it will replace which took 50 minutes to run. the whole point of this topic was to figure out a way to produce a proper N out of 22,000 rank without wading through 22,000 rows of summarized data. like gsquared said, it can't be done.

  • I don't remember him writing so, but I think I will believe you.

    It's your report, not mine.


    N 56°04'39.16"
    E 12°55'05.25"

  • You can try one of those tools like SiSense Prism. I think they do parameterized ranking filters. They may also make the hidden SQL available behind the report - but I'm not certain.

    The site: http://www.sisense.com

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply