More efficient way to find last entry?

  • hi 

    i have a query i want to improve speed of, the problem part of the query is

    left join
                (select a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse,
                row_number() over(partition by a.SalesOrder Order By TrnTime desc)as rn from SysproCompanyW.dbo.SorAdditions a
        join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine)as d
                                    on d.SalesOrder = s.SalesOrder and rn = 1

    basically i need to get the last entry per sales order in the table SorAdditions, the first way i learn to do this was by using above method of partitioning the data and assigning a rownumber per record, but i am guessing there is a cleaner way to achieve this?
    anyone any tips for me?
    mal

  • Could you provide a full SELECT statement and consumable Sample data please (normal drill). That JOIN on it's own doesn't really mean a lot to us.

    At a guess, however, this might be the kind of thing you are after though:
    CREATE TABLE #Customer
      (CustomerID int IDENTITY(1,1),
      CustomerName varchar(20));
    GO

    CREATE TABLE #Sale
      (SaleID int IDENTITY(1,1),
      CustomerID int,
      ItemName varchar(20),
      SaleDateTime datetime);
    GO

    INSERT INTO #Customer
    VALUES
      ('Thom'),
      ('Steve'),
      ('Jane');
    GO

    INSERT INTO #Sale
    VALUES
      (1, 'Banana', '20170401 12:43:00'),
      (1, 'Cheese', '20170401 17:19:00'),
      (2, 'Bacon', '20170402 07:49:21'),
      (2, 'Bacon', '20170405 07:43:02'),
      (2, 'Bacon', '20170407 08:21:59'),
      (3, 'Milk', '20170403 09:49:18');
    GO
    WITH CTE AS (
      SELECT C.CustomerID, S.SaleID,
            C.CustomerName,
            S.ItemName, S.SaleDateTime,
            ROW_NUMBER() OVER (PARTITION BY C.CustomerID
                                ORDER BY S.SaleDateTime DESC) AS RN    
      FROM #Customer C
           JOIN #Sale S ON C.CustomerID = S.CustomerID)
    SELECT CustomerName, ItemName, SaleDateTime
    FROM CTE
    WHERE RN = 1;

    GO
    DROP TABLE #Sale;
    DROP TABLE #Customer;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Hi Thom

    full query is

    select distinct s.Customer
    ,(cASE when OrderType <> 'E' then OrderType else ProductGroup end),d.MWarehouse
    ,OrderDate
    ,left(datename(dw,s.OrderDate),3)
    ,left (TrnTime, len (TrnTime)-4) , Cut_Off_Day,Cut_Off_Time,
    ISNULL(datediff(day,OrderDate,ReqShipDate),0),ReqShipDate,
    Forward_Del_Day,
    s.SalesOrder

                from SysproCompanyW.dbo.SorMaster s

         left join
                (select a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse,
                row_number() over(partition by a.SalesOrder Order By TrnTime desc)as rn from SysproCompanyW.dbo.SorAdditions a
        join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine)as d
                                    on d.SalesOrder = s.SalesOrder and rn = 1
        left join SysproCompanyW.dbo.InvMaster i on i.StockCode = d.MStockCode
         join mal.dbo.Delivery_SLAs da on da.Customer = s.Customer and da.Division = (cASE when OrderType <> 'E' then OrderType else ProductGroup end) and da.Warehouse = d.MWarehouse

    where
    OrderStatus in ('1','F','2') and
    CancelledFlag <> 'Y'
    and OrderDate > = '2015-01-01'
    --and convert(varchar,OrderDate,111) = convert(varchar,getdate(),111)
    and not exists
        (select KeyField from AdmFormData ad where FormType = 'ORD' AND FieldName = 'SLA001'
            and ad.KeyField = s.SalesOrder)
    and (MWarehouse in ('AF','PE')

    or SentToMessagedb = '1')

  • Consumable sample data, expected output? Does my above SQL not help? 🙂

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • dopydb - Friday, April 7, 2017 9:50 AM

    hi 

    i have a query i want to improve speed of, the problem part of the query is

    left join
                (select a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse,
                row_number() over(partition by a.SalesOrder Order By TrnTime desc)as rn from SysproCompanyW.dbo.SorAdditions a
        join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine)as d
                                    on d.SalesOrder = s.SalesOrder and rn = 1

    basically i need to get the last entry per sales order in the table SorAdditions, the first way i learn to do this was by using above method of partitioning the data and assigning a rownumber per record, but i am guessing there is a cleaner way to achieve this?
    anyone any tips for me?
    mal

    The ROW_NUMBER() approach is usually fairly efficient, so I'm curious how you determined that that was where your problems lay.

    If your data is dense enough (lots of TrnTime values for each SalesOrder) and there is a supporting index, the following MAY perform better
    OUTER APPLY
    (
        select TOP(1) a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse
        from SysproCompanyW.dbo.SorAdditions a
        join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine
        WHERE d.SalesOrder = s.SalesOrder
        ORDER BY TrnTime DESC
    )as d

    Just from the field names, I suspect that your data won't be dense enough.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Sorry thom obviously didnt read your request very well! Will try your suggestion when back in the office

    Drew, i am basing purely off execution plan, i think 80% cost was here. Not sure what classifies as dense, but i could have anything from 1 to 30 records in general in trntime.

  • Generally speaking, you're going to take a performance hit when applying ranking at runtime, especially if you're joining on a ranked sub-query. Optimal indexing helps, but I'd suggest adding a new column (let's call it TopAdditionRank for now) to [SorAdditions] that indicates something like a value of (1) for the most recent row that needs to be joined with the related sales order. Once done, you can do away with the sub-query on  [SorAdditions] and just straight join on SalesOrder and TopAdditionRank = 1. This column can perhaps be populated when the sales order is first created, or if it needs to be more dynamic (ex: line detail is added after the sales order is created), then have a job to update the indicator column across all sales orders on a daily or hourly schedule. To reduce the load, you would perhaps only need to update [SorAdditions] for sales orders that havn't been finalized, etc.

    Alternatively, if this ranking really needs to still be performed at runtime, or you can't modify the existing schema, then experiment with this alternate approach.


    select SalesOrder, max(TrnTime)MaxTrnTime
    into #SorAdd
    from SysproCompanyW.dbo.SorAdditions
    group by SalesOrder;
    ...
    ...
    join SysproCompanyW.dbo.SorAdditions a
    join SysproCompanyW.dbo.SorDetail d
     on d.SalesOrder = a.SalesOrder
     and d.SalesOrderLine = a.SalesOrderLine
    join #SorAdd on #SorAdd.MaxTrnTime = d.TrnTime
    ...
    ...

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • dopydb - Friday, April 7, 2017 1:15 PM

    Sorry thom obviously didnt read your request very well! Will try your suggestion when back in the officeDrew, i am basing purely off execution plan, i think 80% cost was here. Not sure what classifies as dense, but i could have anything from 1 to 30 records in general in trntime.

    The only really way to be sure is to test both approaches and to see which one works better.  The idea is that the ROW_NUMBER is likely to do an index scan and that the CROSS APPLY is likely to do a bunch of index seeks.  The index scan will be roughly linear, but overall the index seeks will cost approximately n*LOG(n), so a better approximation of the density is COUNT(*)/(COUNT(DISTINCT SalesOrder) * LOG(COUNT(DISTINCT SalesOrder)).  The larger this number is, the more likely the CROSS APPLY approach is going to work.

    Drew

    PS: I have only a passing knowledge of how indexes work behind the scenes.  My analysis may be completely off.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply