how to GROUP BY with shortest Distance by AccountNumber

  • Given the following example;

    declare @CustIfno table (AccountNumber int, StoreID int, Distance decimal(14,10))

    insert into @CustIfno values ('1','44','2.145223'),('1','45','4.567834'),

    ('1','46','8.4325654'),('2','44','7.8754345'),('2','45','1.54654323'),

    ('2','46','11.5436543'), ('3','44','9.145223'),('3','45','8.567834'),

    ('3','46','17.4325654'),('4','44','7.8754345'),('4','45','1.54654323'),

    ('4','46','11.5436543')

    How can I show the shortest Distance by AccountID and StoreID. Results would look like this;

    AccountNumberStoreID Distance

    1 44 2.1452230000

    2 45 1.5465432300

    3 45 8.5678340000

    4 45 1.5465432300

  • btw, I already came up with

    ;

    with CTE as (

    select *, ROW_NUMBER() Over(Partition By AccountNumber Order By Distance) [Rank]

    from @CustIfno)

    select * from CTE where Rank = 1

    but it runs to slow because the real table has over 150 million rows in it....

  • DROP table #CustIfno;

    CREATE table #CustIfno (AccountNumber int, StoreID int, Distance decimal(14,10));

    insert into #CustIfno values

    ('1','44','2.145223'), --

    ('1','45','4.567834'),

    ('1','46','8.4325654'),

    ('2','44','7.8754345'),

    ('2','45','1.54654323'), --

    ('2','46','11.5436543'),

    ('3','44','9.145223'),

    ('3','45','8.567834'), --

    ('3','46','17.4325654'),

    ('4','44','7.8754345'),

    ('4','45','1.54654323'), --

    ('4','46','11.5436543');

    -- without index

    WITH CTE AS (

    SELECT AccountNumber, StoreID, Distance,

    ROW_NUMBER() OVER(PARTITION BY AccountNumber ORDER BY Distance) [rn]

    FROM #CustIfno)

    SELECT * FROM CTE WHERE rn = 1;

    CREATE INDEX ix_helper ON #CustIfno (AccountNumber,Distance) INCLUDE (StoreID);

    -- with index

    WITH CTE AS (

    SELECT AccountNumber, StoreID, Distance,

    ROW_NUMBER() OVER(PARTITION BY AccountNumber ORDER BY Distance) [rn]

    FROM #CustIfno)

    SELECT * FROM CTE WHERE rn = 1;

    -- alternative query

    SELECT d.AccountNumber, c.StoreID, d.Distance

    FROM (

    SELECT AccountNumber, Distance = MIN(Distance)

    FROM #CustIfno

    GROUP BY AccountNumber

    ) d

    INNER JOIN #CustIfno c

    ON c.AccountNumber = d.AccountNumber AND c.Distance = d.Distance

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • thanks Chris!

  • You're welcome Geoff. Which part helped?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • the last one. The Alternate query.

    but it is still giving me some issues. my table variable example was a little simplistic. the actual way I get the 3 column result set is from another join.

    it looks like this;

    Select a.AccountNumber,

    s.StoreId3,

    a.LocationPoint.STDistance(s.LocationPoint) / 1609.344 as Distance

    From

    prod_Accuzip a

    Join StoreListMaster s on (s.St = a.[State])

    Where

    a.LocationPoint is not null

    I am just struggling now to incorporate a query that does not require a CTE or table variable.

  • It does go around the houses a bit doesn't it. This is probably worth a try:

    Start with a preaggregate

    SELECT

    a.AccountNumber,

    s.StoreId3,

    Distance = MIN(a.LocationPoint.STDistance(s.LocationPoint) / 1609.344)

    FROM prod_Accuzip a

    INNER Join StoreListMaster s ON (s.St = a.[State])

    WHERE a.LocationPoint IS NOT NULL

    GROUP BY a.AccountNumber, s.StoreId3

    Run it into a #temp table because it will have quite a few rows still (temp variables are great for a small number of rows ONLY) then run a second aggregate from it - you could try the row_number trick too.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • It's ashame that you aren't using 2012 as it supports nearest neighbour queries that use a spatial index. So the following query would work reasonably well.

    Select

    a.AccountNumber,

    c.StoreID3,

    c.Distance

    From

    prod_Accuzip a

    CROSS APPLY (

    Select TOP 1

    s.StoreId3,

    a.LocationPoint.STDistance(s.LocationPoint) / 1609.344 as Distance

    From StoreListMaster s

    Where (s.St = a.[State]) and a.LocationPoint.STDistance(s.LocationPoint) is not null

    order by a.LocationPoint.STDistance(s.LocationPoint)

    ) c

    Where

    a.LocationPoint is not null

    Chris's suggestion of preaggregation is your best bet on 2008.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply