More efficient way to find last entry?

Question

More efficient way to find last entry?

dopydb

SSCertifiable

Points: 6687
More actions
April 7, 2017 at 9:50 am

#150573

hi
i have a query i want to improve speed of, the problem part of the query is
left join (select a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse, row_number() over(partition by a.SalesOrder Order By TrnTime desc)as rn from SysproCompanyW.dbo.SorAdditions a join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine)as d on d.SalesOrder = s.SalesOrder and rn = 1
basically i need to get the last entry per sales order in the table SorAdditions, the first way i learn to do this was by using above method of partitioning the data and assigning a rownumber per record, but i am guessing there is a cleaner way to achieve this?
anyone any tips for me?
mal

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99141 More actions · Answer 1

Could you provide a full SELECT statement and consumable Sample data please (normal drill). That JOIN on it's own doesn't really mean a lot to us.

At a guess, however, this might be the kind of thing you are after though:
CREATE TABLE #Customer (CustomerID int IDENTITY(1,1), CustomerName varchar(20)); GO

CREATE TABLE #Sale (SaleID int IDENTITY(1,1), CustomerID int, ItemName varchar(20), SaleDateTime datetime); GO

INSERT INTO #Customer VALUES ('Thom'), ('Steve'), ('Jane'); GO

INSERT INTO #Sale VALUES (1, 'Banana', '20170401 12:43:00'), (1, 'Cheese', '20170401 17:19:00'), (2, 'Bacon', '20170402 07:49:21'), (2, 'Bacon', '20170405 07:43:02'), (2, 'Bacon', '20170407 08:21:59'), (3, 'Milk', '20170403 09:49:18'); GO WITH CTE AS ( SELECT C.CustomerID, S.SaleID, C.CustomerName, S.ItemName, S.SaleDateTime, ROW_NUMBER() OVER (PARTITION BY C.CustomerID ORDER BY S.SaleDateTime DESC) AS RN FROM #Customer C JOIN #Sale S ON C.CustomerID = S.CustomerID) SELECT CustomerName, ItemName, SaleDateTime FROM CTE WHERE RN = 1;

GO DROP TABLE #Sale; DROP TABLE #Customer;

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

dopydb SSCertifiable Points: 6687 More actions · Answer 2

Hi Thom

full query is

select distinct s.Customer ,(cASE when OrderType <> 'E' then OrderType else ProductGroup end),d.MWarehouse ,OrderDate ,left(datename(dw,s.OrderDate),3) ,left (TrnTime, len (TrnTime)-4) , Cut_Off_Day,Cut_Off_Time, ISNULL(datediff(day,OrderDate,ReqShipDate),0),ReqShipDate, Forward_Del_Day, s.SalesOrder

from SysproCompanyW.dbo.SorMaster s

left join (select a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse, row_number() over(partition by a.SalesOrder Order By TrnTime desc)as rn from SysproCompanyW.dbo.SorAdditions a join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine)as d on d.SalesOrder = s.SalesOrder and rn = 1 left join SysproCompanyW.dbo.InvMaster i on i.StockCode = d.MStockCode join mal.dbo.Delivery_SLAs da on da.Customer = s.Customer and da.Division = (cASE when OrderType <> 'E' then OrderType else ProductGroup end) and da.Warehouse = d.MWarehouse

where OrderStatus in ('1','F','2') and CancelledFlag <> 'Y' and OrderDate > = '2015-01-01' --and convert(varchar,OrderDate,111) = convert(varchar,getdate(),111) and not exists (select KeyField from AdmFormData ad where FormType = 'ORD' AND FieldName = 'SLA001' and ad.KeyField = s.SalesOrder) and (MWarehouse in ('AF','PE')

or SentToMessagedb = '1')

Thom A SSC Guru Points: 99141 More actions · Answer 3

Consumable sample data, expected output? Does my above SQL not help? 🙂

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

drew.allen SSC Guru Points: 77012 More actions · Answer 4

dopydb - Friday, April 7, 2017 9:50 AM
hi
i have a query i want to improve speed of, the problem part of the query is
left join (select a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse, row_number() over(partition by a.SalesOrder Order By TrnTime desc)as rn from SysproCompanyW.dbo.SorAdditions a join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine)as d on d.SalesOrder = s.SalesOrder and rn = 1
basically i need to get the last entry per sales order in the table SorAdditions, the first way i learn to do this was by using above method of partitioning the data and assigning a rownumber per record, but i am guessing there is a cleaner way to achieve this?
anyone any tips for me?
mal

The ROW_NUMBER() approach is usually fairly efficient, so I'm curious how you determined that that was where your problems lay.

If your data is dense enough (lots of TrnTime values for each SalesOrder) and there is a supporting index, the following MAY perform better
OUTER APPLY ( select TOP(1) a.SalesOrder,TrnTime,a.SalesOrderLine,MStockCode,MWarehouse from SysproCompanyW.dbo.SorAdditions a join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine WHERE d.SalesOrder = s.SalesOrder ORDER BY TrnTime DESC )as d

Just from the field names, I suspect that your data won't be dense enough.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

dopydb SSCertifiable Points: 6687 More actions · Answer 5

Sorry thom obviously didnt read your request very well! Will try your suggestion when back in the office

Drew, i am basing purely off execution plan, i think 80% cost was here. Not sure what classifies as dense, but i could have anything from 1 to 30 records in general in trntime.

Eric M Russell SSC Guru Points: 125627 More actions · Answer 6

Generally speaking, you're going to take a performance hit when applying ranking at runtime, especially if you're joining on a ranked sub-query. Optimal indexing helps, but I'd suggest adding a new column (let's call it TopAdditionRank for now) to [SorAdditions] that indicates something like a value of (1) for the most recent row that needs to be joined with the related sales order. Once done, you can do away with the sub-query on [SorAdditions] and just straight join on SalesOrder and TopAdditionRank = 1. This column can perhaps be populated when the sales order is first created, or if it needs to be more dynamic (ex: line detail is added after the sales order is created), then have a job to update the indicator column across all sales orders on a daily or hourly schedule. To reduce the load, you would perhaps only need to update [SorAdditions] for sales orders that havn't been finalized, etc.

Alternatively, if this ranking really needs to still be performed at runtime, or you can't modify the existing schema, then experiment with this alternate approach.

select SalesOrder, max(TrnTime)MaxTrnTime into #SorAdd from SysproCompanyW.dbo.SorAdditions group by SalesOrder; ... ... join SysproCompanyW.dbo.SorAdditions a join SysproCompanyW.dbo.SorDetail d on d.SalesOrder = a.SalesOrder and d.SalesOrderLine = a.SalesOrderLine join #SorAdd on #SorAdd.MaxTrnTime = d.TrnTime ... ...

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

drew.allen SSC Guru Points: 77012 More actions · Answer 7

dopydb - Friday, April 7, 2017 1:15 PM
Sorry thom obviously didnt read your request very well! Will try your suggestion when back in the officeDrew, i am basing purely off execution plan, i think 80% cost was here. Not sure what classifies as dense, but i could have anything from 1 to 30 records in general in trntime.

The only really way to be sure is to test both approaches and to see which one works better. The idea is that the ROW_NUMBER is likely to do an index scan and that the CROSS APPLY is likely to do a bunch of index seeks. The index scan will be roughly linear, but overall the index seeks will cost approximately n*LOG(n), so a better approximation of the density is COUNT(*)/(COUNT(DISTINCT SalesOrder) * LOG(COUNT(DISTINCT SalesOrder)). The larger this number is, the more likely the CROSS APPLY approach is going to work.

Drew

PS: I have only a passing knowledge of how indexes work behind the scenes. My analysis may be completely off.

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA