Issue with max() and aggregate, missunderstanding

Question

Post reply

Issue with max() and aggregate, missunderstanding

frederico_fonseca

SSCoach

Points: 16309

August 27, 2019 at 8:25 am

Answer

#3674627

Sorry if my previous reply sounded a bit harsh.

Had a chance to review your replies and now I think the following will do what you need.

two options - both should yield same result but performance may vary. testing on your side required and if either/both are slow we will need actual execution explain plan files so we can have a look at them.

Note that I have changed your table references to use alias - this makes your code clearer and is the approach you should follow when coding.

I also removed the order by - this should not be used on the view and should instead be done as part of the select from the view you create e.g. select ... from myview mv order by mv.Priority desc, mv.PromisedDeliveryDate

With a outer apply and order by - subquery will return the most recent record for a given DocumentNo/JobNo

create view myview
as
select soor.Priority
     , soor.PromisedDeliveryDate
     , soor.DocumentNo
     , slca.CustomerAccountNumber
     , soor.CustomerDocumentNo
     , soor.RequestedDeliveryDate
     , soor.AnalysisCode3
     , soor.InvoiceCreditStatusID
     , slca.AccountIsOnHold
     , soor.CancelledStatusID
     , soor.DocumentStatusID
     , slca.CustomerAccountName
     , soor.DocumentTypeID
     , soor.DespatchReceiptStatusID
     , sc.Dept
     , sc.ScanDate
from dbo.SOPOrderReturn soor
inner join dbo.SLCustomerAccount slca
    on soor.CustomerID = slca.SLCustomerAccountID
outer apply (select top 1 sc.Dept
                        , sc.ScanDate
             from dbo.Scans sc
             where soor.DocumentNo = sc.JobNo
             order by sc.ScanDate desc
            ) sc
where (soor.InvoiceCreditStatusID = 0)
    and (not (soor.DocumentStatusID = '2'))
    and (soor.DocumentTypeID = '0'
    or soor.DocumentTypeID = '2')
    and (not (soor.DespatchReceiptStatusID = 2))

With a sub-query with a row_number - records are assigned a sequence number which is reset per each group based on "Partition By" clause ordered by ScanDate descending.

For each "Partition By" set of columns there will be a unique number 1 which is the one we are interested e.g. the most recent scan for those columns (JobNo on this case)

create view myview
as
select soor.Priority
     , soor.PromisedDeliveryDate
     , soor.DocumentNo
     , slca.CustomerAccountNumber
     , soor.CustomerDocumentNo
     , soor.RequestedDeliveryDate
     , soor.AnalysisCode3
     , soor.InvoiceCreditStatusID
     , slca.AccountIsOnHold
     , soor.CancelledStatusID
     , soor.DocumentStatusID
     , slca.CustomerAccountName
     , soor.DocumentTypeID
     , soor.DespatchReceiptStatusID
     , sc.Dept
     , sc.ScanDate
from dbo.SOPOrderReturn soor
inner join dbo.SLCustomerAccount slca
    on soor.CustomerID = slca.SLCustomerAccountID
left outer join (select sc.JobNo
                      , sc.Dept
                      , sc.ScanDate
                      , row_number() over (partition by sc.JobNo
                                            order by sc.ScanDate desc
                                          ) rownum
                 from dbo.Scans sc
                ) sc
    on soor.DocumentNo = sc.JobNo
    and sc.rownum = 1
where (soor.InvoiceCreditStatusID = 0)
    and (not (soor.DocumentStatusID = '2'))
    and (soor.DocumentTypeID = '0'
    or soor.DocumentTypeID = '2')
    and (not (soor.DespatchReceiptStatusID = 2))

Viewing 5 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

insomnius Old Hand Points: 335 More actions · Answer 1

Hi Frederico, I did type a nice reply here but it seems to have vanished...

I just wanted to thank you for posting above, and I do appreciate that a simple mis-read post can make one thing look entirely like another. No worries from me, I am very appreciative of all help wherever or however it arrives!

I understand why you have re-referenced/aliased the columns, it makes sense even to me, thank you for the pointer there. I have also removed the ORDER BY from anything I already have applied, again, adding ORDER to the SELECT after creating a view makes sense.

Out of curiosity, what are: execution explain plan files ?

insomnius Old Hand Points: 335 More actions · Answer 2

I had my Eureka moment!!! I think I understand how your first set of code works 🙂

jcelko212 32090 SSCrazy Eights Points: 9306 More actions · Answer 3

Could you please start posting talents and venting your own language? Now we have to rewrite everything. Why do you think that the proprietary identity table property could ever be a relational key? Remember that the key by definition is a subset of the columns of a table that uniquely identify a row. It has nothing to do with the physical insertion order on a physical disk on one machine. Identity is basically the old UNIX sequential file record number that Sybase used implement their first versions of the product so many decades ago.

Also, while it's not technically wrong, why do you have varying length columns? They make printing paper forms or screens a pain in the butt. If you look at the ISO standards find very very very few of them are varying length varying length columns?

CREATE TABLE Scans

(job_nbr CHAR(10) NOT NULL

CHECK (job_nbr LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'),

dept_name VARCHAR(20) NOT NULL,

scan_timestamp DATETIME2(0) DEFAULT CURRENT_TIMESTAMP NOT NULL,

PRIMARY KEY (job_nbr, dept_name, scan_timestamp));

INSERT INTO Scans

VALUES

('0000115717', 'Office', '2019-08-24 22:51:00),

('0000115717', 'Office', '2019-08-24 22:52:00'),

('0000115717', 'CAD', '2019-08-25 00:13:44'),

('0000115717', 'Despatch', '2019-08-26 16:30:26'),

('0000115717', 'Despatch', '2019-08-26 16:31:29'),

('0000115847', 'CAD', '2019-08-25 10:44:14'),

('0000115847', 'Laser', '2019-08-25 13:09:47'),

('0000115847', 'Eyeletting', '2019-08-26 16:36:09'),

('0000115847', 'Spirals', '2019-08-26 16:40:05'),

('0000119378', 'Machining', '2019-08-25 14:57:54'),

('0000126563', 'Office', '2019-08-25 00:12:30'),

('0000127652', 'Inspection', '2019-08-22 16:06:49'),

('0000127754', 'Office', '2019-08-22 15:50:51'),

('0000127754', 'Office', '2019-08-22 15:58:35');

WITH Last_Scan

AS

(SELECT job_nbr, dept_name,

MAX(scan_timestamp) OVER (PARTITION BY job_nbr, dept_name) AS last_scan_timestamp

FROM Scans)

SELECT S.job_nbr, S.dept_name, S.scan_timestamp

FROM Scans AS S, Last_Scan AS LS

WHERE S.scan_timestamp = LS.last_scan_timestamp;

Please post DDL and follow ANSI/ISO standards when asking for help.

ScottPletcher SSC Guru Points: 101238 More actions · Answer 4

I recommend one simplification to make your SQL coding and maintenance easier: use "*" in the inner query using ROW_NUMBER(). SQL will still only actually retrieve the data that is required by the outer query; you can confirm this by looking at the query plan. The difference is that when you add a column to the outer query, you don't have to make any change to the inner query.

left outer join (select *
                      , row_number() over (partition by sc.JobNo
                                            order by sc.ScanDate desc
                                          ) rownum
                 from dbo.Scans sc
                ) sc
    on soor.DocumentNo = sc.JobNo
    and sc.rownum = 1

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".