Issue with max() and aggregate, missunderstanding

Question

Post reply

Issue with max() and aggregate, missunderstanding

insomnius

Old Hand

Points: 335
More actions
August 25, 2019 at 1:09 pm
Go to Answer
#3674248
Hi all, this is my first post here and I've been looking and trying to understand this for hours the various ways people have tried to get around this issue.
My trouble is as follows. I have a database with three tables linked together by 'DocumentNo'. Two main linked tables which output all rows with a full outer join. These records are all unique and are never the same.
The third table is a child table if you will, and this has many rows of it's own, made up of 'Dept' and 'ScanDate'.
What I need to do, is output unique rows showing only the latest 'scan date' for each unique 'DocumentNo'. I have tried max([ScanDate]) with GROUP BY [DocumentNo] but i'm fairly sure you can guess the error message I am receiving.
Are there any hinters, or pointers out there that might help with this one, and it would be most appreciated if it could be done in a way that I can get my eureka moment and understand what is going on...
```
SELECT TOP (1000)
[Priority]
,[PromisedDeliveryDate]
,[DocumentNo]
,[CustomerAccountNumber]
,[CustomerDocumentNo]
,[RequestedDeliveryDate]
,[AnalysisCode3]
,[InvoiceCreditStatusID]
,[AccountIsOnHold]
,[CancelledStatusID]
,[DocumentStatusID]
,[CustomerAccountName]
,[DocumentTypeID]
,[DespatchReceiptStatusID]
,[Dept]
,[ScanDate]
FROM [mydb].[dbo].[myview]
```
- This topic was modified 6 years, 11 months ago by insomnius.

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

frederico_fonseca SSCoach Points: 16338 More actions · Answer 1

get used to post full ddl and sample data - hard to give correct help without it.

2 possible ways to do it (more exist) - which one is better depends on volumes of underlying tables

-- using outer apply 
select *
from mastertable mt
outer apply (select max(ScanDate) as ScanDate
             from childtable ct 
             where ct.document = mt.document
             ) ct

-- using subquery
select *
from mastertable mt
left outer join (select ct.document, max(ScanDate) as ScanDate
                 from childtable ct 
                 group by ct.document
             ) ct
on ct.document = mt.document

insomnius Old Hand Points: 335 More actions · Answer 2

Hi, I can see what you have done there, from the look of it, it is breaking up the 1st/2nd table from the third table, applying the max value to that and joining the remaining row to the 1st/2nd table? (if i'm wrong please tell me...)

What I am trying to get to, is to have the information in a view that a web page will look at as a single source of information, but I have tried using your code to get a separate view, and SQL Server doesn't like it.

pietlinden SSC Guru Points: 63686 More actions · Answer 3

So you just slap a CREATE VIEW on the front and list the columns you want to return. Only catch is that you can't use ORDER BY in a view.

CREATE VIEW MyView
AS
select <field list>
from mastertable mt
outer apply (select max(ScanDate) as ScanDate
             from childtable ct 
             where ct.document = mt.document
             ) ct

jcelko212 32090 SSCrazy Eights Points: 9312 More actions · Answer 4

>> I have a database with three tables linked [sic] together by 'DocumentNo'. Two main linked [sic] tables which output all rows with a full outer join. These records [sic] are all unique and are never the same. <<

You're obviously coming to SQL for background in older filesystems or pre-relational databases. The term "link" refers to pointer chains that existed in a network and hierarchical databases; SQL and RDBMS use references which are far more abstract.

>> The third table is a child [sic] table if you will, and this has many rows of it's own, made up of 'Dept' and 'ScanDate'. <<

The term "child" and "parent" are also inherited from the old network databases. We have referenced and referencing tables and RDBMS.

>> What I need to do is output unique rows showing only the latest 'scan date' for each unique 'DocumentNo'. I have tried max([ScanDate]) with GROUP BY [DocumentNo] but I'm fairly sure you can guess the error message I am receiving. <<

Guess your error messages? With a complete lack of information you posted here, we can even guess what your tables look like! For over 30 years posting DDL has been standard netiquette on SQL forums. If you want to get help can you please play by the rules?

The data element name you posted in your sample query is also disasters. I don't know if you're old enough to remember why we used to put commas at the front of each punchcard, but it let us rearrange the deck of cards and reuse them easily. But it makes maintenance and reading the code hard -- about 8 to 12% harder. For the last 20 or 30 years, programmers have had pretty printers (code formatting tools) that require the push of a button to arrange them properly. Postfix is like "_nbr" , "_date", "_code" etc are called attribute properties. You can look at ISO standard 11179 or the metadata committee guides.

These fundamental design rules tell us that crap like "dispatch_receipt_status_id" is a bad data element name that has no business in the schema. Either the data element is a status of some kind or it is an identifier, but it is never both. This is like putting a string of adjectives on the noun in English. One of my favorite T-shirts has the slogan "on a scale from 1 to 10, what color is your favorite letter of the alphabet?" Because it demonstrates mixed datatypes and metadata confusion all in one phrase. But when people read it, they actually try to answer it!

Want to follow the rules and try again?

Please post DDL and follow ANSI/ISO standards when asking for help.

insomnius Old Hand Points: 335 More actions · Answer 5

jcelko212 32090 wrote:

Want to follow the rules and try again?

Thank you for using a giant sized hand to slap this new fellow down.

I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. You're looking on a forum for answers or at least a pointer in the right direction. I am quite happy to look myself when I know what question to ask, but alas, I get stuck every now and then and I had a thought that a forum such as this one might be the best place to start asking questions.

I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that.

I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification.

The program the office works with is Sage 200c On-Premise. The data is sat on MS Server 2016 hosting SQL Server 2016.

As for presuming that the professionals here will probably know the error message I am receiving, you're right I shouldn't presume.

The error is as follows:

Column 'Sage200_mydb.dbo.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Now, I have done a little digging into this, but from a very limited understanding, you can either make the other columns a max, min, etc or add the column name into the group by clause. Another post suggested that this was the wrong way to do this. Again, a lack of understanding on my part and wanting to understand how to use this Max and Group By properly means I wanted to get the best help and pointers possible so I don't spend another 24 hours scratching my head.

The code I posted in my first message was a copy/paste straight from SQL Server showing me the top 1000 rows for the view I mentioned in my earlier post. If it was wrong for SQL Server to provide me with the query in that format with the commas on each new line, then I will adjust it around next time I post code.

I fully apologize for my lack of awareness on the Netiquette of SQL Forums, that has been around for 30 years. This was my first ever post in this forum, actually in any forum on SQL. I will read up on what you are expecting from me in future.

On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help?

The current View (copied from the 'design') is as follows:

SELECT        TOP (100) PERCENT dbo.SOPOrderReturn.Priority, dbo.SOPOrderReturn.PromisedDeliveryDate, dbo.SOPOrderReturn.DocumentNo, dbo.SLCustomerAccount.CustomerAccountNumber, 
                         dbo.SOPOrderReturn.CustomerDocumentNo, dbo.SOPOrderReturn.RequestedDeliveryDate, dbo.SOPOrderReturn.AnalysisCode3, dbo.SOPOrderReturn.InvoiceCreditStatusID, 
                         dbo.SLCustomerAccount.AccountIsOnHold, dbo.SOPOrderReturn.CancelledStatusID, dbo.SOPOrderReturn.DocumentStatusID, dbo.SLCustomerAccount.CustomerAccountName, 
                         dbo.SOPOrderReturn.DocumentTypeID, dbo.SOPOrderReturn.DespatchReceiptStatusID, dbo.Scans.Dept, dbo.Scans.ScanDate
FROM            dbo.SOPOrderReturn INNER JOIN
                         dbo.SLCustomerAccount ON dbo.SOPOrderReturn.CustomerID = dbo.SLCustomerAccount.SLCustomerAccountID FULL OUTER JOIN
                         dbo.Scans ON dbo.SOPOrderReturn.DocumentNo = dbo.Scans.JobNo
WHERE        (dbo.SOPOrderReturn.InvoiceCreditStatusID = 0) AND (NOT (dbo.SOPOrderReturn.DocumentStatusID = '2')) AND (dbo.SOPOrderReturn.DocumentTypeID = '0' OR
                         dbo.SOPOrderReturn.DocumentTypeID = '2') AND (NOT (dbo.SOPOrderReturn.DespatchReceiptStatusID = 2))
ORDER BY dbo.SOPOrderReturn.Priority DESC, dbo.SOPOrderReturn.PromisedDeliveryDate

Sample output of the query is as follows: (edited as this holds actual customer information)

PriorityPromisedDeliveryDateDocumentNoCustomerAccountNumberCustomerDocumentNoRequestedDeliveryDateAnalysisCode3InvoiceCreditStatusIDAccountIsOnHoldCancelledStatusIDDocumentStatusIDCustomerAccountNameDocumentTypeIDDespatchReceiptStatusIDDeptScanDate
2020-03-10 00:00:00.0000000115717ACC00300000000012020-03-10 00:00:00.0000000Cust 100Office2019-08-24 22:51:00.0000000
2020-03-10 00:00:00.0000000115717ACC00300000000012020-03-10 00:00:00.0000000Cust 100Office2019-08-24 22:52:00.0000000
2020-03-10 00:00:00.0000000115717ACC00300000000012020-03-10 00:00:00.0000000Cust 100CAD2019-08-25 00:13:44.0000000
2020-03-03 00:00:00.0000000115847ACC00300000000022020-03-03 00:00:00.0000000Cust 100CAD2019-08-25 10:44:14.0000000
2020-03-03 00:00:00.0000000115847ACC00300000000022020-03-03 00:00:00.0000000Cust 100Laser2019-08-25 13:09:47.0000000

I hope I have provided enough information now to get me started?

in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) under DocumentNo, this is the main column for the whole thing I am working on. I am wanting to show single DocumentNo that refers to the latest ScanDate. This is why my initial thought was to go straight to Max([ScanDate]) and then Group By [DocumentNo]. In my mind it made sense, but of course I am missing something. If I add [Dept] into the Group By, I get all the columns anyway so was unsure how to proceed further. If I leave it out, I get the error mentioned above.

Kindest regards

This reply was modified 6 years, 11 months ago by insomnius.
This reply was modified 6 years, 11 months ago by insomnius.
This reply was modified 6 years, 11 months ago by insomnius.
This reply was modified 6 years, 11 months ago by insomnius.
This reply was modified 6 years, 11 months ago by insomnius.

Lynn Pettis SSC Guru Points: 442462 More actions · Answer 6

insomnius wrote:

jcelko212 32090 wrote:
>> I have a database with three tables linked [sic] together by 'DocumentNo'. Two main linked [sic] tables which output all rows with a full outer join. These records [sic] are all unique and are never the same. <<
You're obviously coming to SQL for background in older filesystems or pre-relational databases. The term "link" refers to pointer chains that existed in a network and hierarchical databases; SQL and RDBMS use references which are far more abstract.
>> The third table is a child [sic] table if you will, and this has many rows of it's own, made up of 'Dept' and 'ScanDate'. <<
The term "child" and "parent" are also inherited from the old network databases. We have referenced and referencing tables and RDBMS.
>> What I need to do is output unique rows showing only the latest 'scan date' for each unique 'DocumentNo'. I have tried max([ScanDate]) with GROUP BY [DocumentNo] but I'm fairly sure you can guess the error message I am receiving. <<
Guess your error messages? With a complete lack of information you posted here, we can even guess what your tables look like! For over 30 years posting DDL has been standard netiquette on SQL forums. If you want to get help can you please play by the rules?
The data element name you posted in your sample query is also disasters. I don't know if you're old enough to remember why we used to put commas at the front of each punchcard, but it let us rearrange the deck of cards and reuse them easily. But it makes maintenance and reading the code hard -- about 8 to 12% harder. For the last 20 or 30 years, programmers have had pretty printers (code formatting tools) that require the push of a button to arrange them properly. Postfix is like "_nbr" , "_date", "_code" etc are called attribute properties. You can look at ISO standard 11179 or the metadata committee guides.
These fundamental design rules tell us that crap like "dispatch_receipt_status_id" is a bad data element name that has no business in the schema. Either the data element is a status of some kind or it is an identifier, but it is never both. This is like putting a string of adjectives on the noun in English. One of my favorite T-shirts has the slogan "on a scale from 1 to 10, what color is your favorite letter of the alphabet?" Because it demonstrates mixed datatypes and metadata confusion all in one phrase. But when people read it, they actually try to answer it!
Want to follow the rules and try again?
Thank you for using a giant sized hand to slap this new fellow down.
I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. You're looking on a forum for answers or at least a pointer in the right direction. I am quite happy to look myself when I know what question to ask, but alas, I get stuck every now and then and I had a thought that a forum such as this one might be the best place to start asking questions.
I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that.
I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification.
The program the office works with is Sage 200c On-Premise. The data is sat on MS Server 2016 hosting SQL Server 2016.
As for presuming that the professionals here will probably know the error message I am receiving, you're right I shouldn't presume.
The error is as follows:
Column 'Sage200_mydb.dbo.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
Now, I have done a little digging into this, but from a very limited understanding, you can either make the other columns a max, min, etc or add the column name into the group by clause. Another post suggested that this was the wrong way to do this. Again, a lack of understanding on my part and wanting to understand how to use this Max and Group By properly means I wanted to get the best help and pointers possible so I don't spend another 24 hours scratching my head.
The code I posted in my first message was a copy/paste straight from SQL Server showing me the top 1000 rows for the view I mentioned in my earlier post. If it was wrong for SQL Server to provide me with the query in that format with the commas on each new line, then I will adjust it around next time I post code.
I fully apologize for my lack of awareness on the Netiquette of SQL Forums, that has been around for 30 years. This was my first ever post in this forum, actually in any forum on SQL. I will read up on what you are expecting from me in future.
On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help?
The current View (copied from the 'design') is as follows:
SELECT        TOP (100) PERCENT dbo.SOPOrderReturn.Priority, dbo.SOPOrderReturn.PromisedDeliveryDate, dbo.SOPOrderReturn.DocumentNo, dbo.SLCustomerAccount.CustomerAccountNumber, 
                         dbo.SOPOrderReturn.CustomerDocumentNo, dbo.SOPOrderReturn.RequestedDeliveryDate, dbo.SOPOrderReturn.AnalysisCode3, dbo.SOPOrderReturn.InvoiceCreditStatusID, 
                         dbo.SLCustomerAccount.AccountIsOnHold, dbo.SOPOrderReturn.CancelledStatusID, dbo.SOPOrderReturn.DocumentStatusID, dbo.SLCustomerAccount.CustomerAccountName, 
                         dbo.SOPOrderReturn.DocumentTypeID, dbo.SOPOrderReturn.DespatchReceiptStatusID, dbo.Scans.Dept, dbo.Scans.ScanDate
FROM            dbo.SOPOrderReturn INNER JOIN
                         dbo.SLCustomerAccount ON dbo.SOPOrderReturn.CustomerID = dbo.SLCustomerAccount.SLCustomerAccountID FULL OUTER JOIN
                         dbo.Scans ON dbo.SOPOrderReturn.DocumentNo = dbo.Scans.JobNo
WHERE        (dbo.SOPOrderReturn.InvoiceCreditStatusID = 0) AND (NOT (dbo.SOPOrderReturn.DocumentStatusID = '2')) AND (dbo.SOPOrderReturn.DocumentTypeID = '0' OR
                         dbo.SOPOrderReturn.DocumentTypeID = '2') AND (NOT (dbo.SOPOrderReturn.DespatchReceiptStatusID = 2))
ORDER BY dbo.SOPOrderReturn.Priority DESC, dbo.SOPOrderReturn.PromisedDeliveryDate
Sample output of the query is as follows: (edited as this holds actual customer information)
PriorityPromisedDeliveryDateDocumentNoCustomerAccountNumberCustomerDocumentNoRequestedDeliveryDateAnalysisCode3InvoiceCreditStatusIDAccountIsOnHoldCancelledStatusIDDocumentStatusIDCustomerAccountNameDocumentTypeIDDespatchReceiptStatusIDDeptScanDate
2020-03-10 00:00:00.0000000115717ACC00300000000012020-03-10 00:00:00.0000000Cust 100Office2019-08-24 22:51:00.0000000
2020-03-10 00:00:00.0000000115717ACC00300000000012020-03-10 00:00:00.0000000Cust 100Office2019-08-24 22:52:00.0000000
2020-03-10 00:00:00.0000000115717ACC00300000000012020-03-10 00:00:00.0000000Cust 100CAD2019-08-25 00:13:44.0000000
2020-03-03 00:00:00.0000000115847ACC00300000000022020-03-03 00:00:00.0000000Cust 100CAD2019-08-25 10:44:14.0000000
2020-03-03 00:00:00.0000000115847ACC00300000000022020-03-03 00:00:00.0000000Cust 100Laser2019-08-25 13:09:47.0000000
I hope I have provided enough information now to get me started?
in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) under DocumentNo, this is the main column for the whole thing I am working on. I am wanting to show single DocumentNo that refers to the latest ScanDate. This is why my initial thought was to go straight to Max([ScanDate]) and then Group By [DocumentNo]. In my mind it made sense, but of course I am missing something. If I add [Dept] into the Group By, I get all the columns anyway so was unsure how to proceed further. If I leave it out, I get the error mentioned above.
Kindest regards

I haven't read your entire reply, as it is quite long, but I did read the opening. When it comes to Mr. Celko, ignore him. He believes that bashing people for using the wrong terminology, blaming people for the inaccuracies with which they are forced to work with (inherited designs), etc. is the best way to teach people. He has absolutely no tact and has even been know to post code that doesn't work.

insomnius Old Hand Points: 335 More actions · Answer 7

I appreciate the response Lynn. I have to admit, I knew coming in that I wasn't best equipped with enough knowledge to be able to outright ask for what I wanted, I was expecting a little resistance.

I 'believe' I have posted what was asked for, if someone would kindly have a look at me and nudge me in the right direction that would be very much appreciated.

Kindest regards

Inso

jcelko212 32090 SSCrazy Eights Points: 9312 More actions · Answer 8

>> I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. <<

Before you can ask any kind of question, you have to have some basics. And very often the answer is you need to go to school for a long time.

>> I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that. <<

Are you familiar with the Kruger-Dunning effect? 🙂 Trying to learn by trial and error as opposed to actually getting an education on basics is very often not a good idea.

>> I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification. <<

I was at Comdex at the Microsoft show where they premiered access so long ago. I VIEW it as a total disaster. It's not really SQL in the first release of it sorted dates alphabetically instead of temporally. Ugh!

>> Column 'Sage200_mydb.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. <<

This has to do with the scoping rules of the language. But we need to se DDL.

>> On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help? <<

DDL stands for "Data Declaration Language" and it is one of three sub-languages in SQL. It's the stuff that creates schema objects, like tables, VIEWs, triggers, sequences, etc. Since SQL is declarative, 80 to 90% of the work is done in the DDL. The other two sub-languages are DCL (data control language, which does the security and access – – think grant and revoke privileges) and DML (data manipulation language) which is where we do queries with select, change the contents of the tables we declared in the DDL with insert update and delete statements, etc.

>> The current VIEW (copied from the 'design') is as follows: <<

Don't ever put in ORDER BY clause on a VIEW. It's meaningless because a table has no order. Ordering makes sense only when you convert a table into a cursor and a view is always a virtual table. The optimizer just throws it out but looked sort of silly. I think I already criticized the violations of naming rules and basic data modeling in the base tables.

I will be honest about this. I am one of the few people who actually sits down and reads ANSI/ISO standards. They are incredibly dull, and you need to know the language its kind of like trying to read a legal document. I had to do it for the 10 years I was on the ANSI X3H2 committee and somehow I kept up the habit after I left. My wife is right I don't get out enough.

>> in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) <<

In a word, COBOL! Since this was the first business language was widely used, people tended to move old COBOL systems into SQL. Since COBOL keeps everything in string pictures, you find crap like dates and times coded in strings, things that should be numeric because are used for calculations also encoded as strings, etc. SQL has a wide range of available data types and we went out of our way to see that are well data types matched into all of the major mainframe data types in the 1980s. It's also amazing how the legacy of punch cards still lives in the 21st century.

I noticed another thing that you get the same data element different names when it comes from different tables. No, a data element I have one and only one name throughout the entire schema. Ideally, a data element you have one and only one well-understood name in the entire universe; this is why a VIN has that name and definition in every insurance company, every car manufacturer, etc. on earth.

Please post DDL and follow ANSI/ISO standards when asking for help.

drew.allen SSC Guru Points: 77012 More actions · Answer 9

jcelko212 32090 wrote:

I will be honest about this. I am one of the few people who actually sits down and reads ANSI/ISO standards. They are incredibly dull, and you need to know the language its kind of like trying to read a legal document. I had to do it for the 10 years I was on the ANSI X3H2 committee and somehow I kept up the habit after I left. My wife is right I don't get out enough.

People don't read them, because they charge an exorbitant amount of money to get them. I, for one, would read them if they were available for free online.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

jcelko212 32090 SSCrazy Eights Points: 9312 More actions · Answer 10

People don't read them, because they charge an exorbitant amount of money to get them. I, for one, would read them if they were available for free online.

The problems have always been there two finance models in the standards world. ISO get ll of its income from selling copies of the standards so it does a lot to protect them. ANSI gets its income from assessing membership fees and has wanted to do exactly what you said – put them online for free.

Then we get into national standards groups who have their own agendas. JIS (Japanese industrial standards) would like to have Japanese language documents, but they're polite about it. The French, on the other hand, hate anything that has to do with the English language. They've actually blocked standards on this issue. They failed to admit that French is a slowly dying language (fewer native and secondary speakers every year). But it still required on the ISO website. Did you ever wonder why we call the organization ISO when the English language name for it is "International Organization for Standards"? Likewise, we have UTC which is supposed to be "universal coordinated time" in English?

Please post DDL and follow ANSI/ISO standards when asking for help.

Lynn Pettis SSC Guru Points: 442462 More actions · Answer 11

jcelko212 32090 wrote:

>> I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. <<
Before you can ask any kind of question, you have to have some basics. And very often the answer is you need to go to school for a long time.
>> I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that. <<
Are you familiar with the Kruger-Dunning effect? 🙂 Trying to learn by trial and error as opposed to actually getting an education on basics is very often not a good idea.
>> I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification. <<
I was at Comdex at the Microsoft show where they premiered access so long ago. I VIEW it as a total disaster. It's not really SQL in the first release of it sorted dates alphabetically instead of temporally. Ugh!
>> Column 'Sage200_mydb.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. <<
This has to do with the scoping rules of the language. But we need to se DDL.
>> On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help? <<
DDL stands for "Data Declaration Language" and it is one of three sub-languages in SQL. It's the stuff that creates schema objects, like tables, VIEWs, triggers, sequences, etc. Since SQL is declarative, 80 to 90% of the work is done in the DDL. The other two sub-languages are DCL (data control language, which does the security and access – – think grant and revoke privileges) and DML (data manipulation language) which is where we do queries with select, change the contents of the tables we declared in the DDL with insert update and delete statements, etc.
>> The current VIEW (copied from the 'design') is as follows: <<
Don't ever put in ORDER BY clause on a VIEW. It's meaningless because a table has no order. Ordering makes sense only when you convert a table into a cursor and a view is always a virtual table. The optimizer just throws it out but looked sort of silly. I think I already criticized the violations of naming rules and basic data modeling in the base tables.
I will be honest about this. I am one of the few people who actually sits down and reads ANSI/ISO standards. They are incredibly dull, and you need to know the language its kind of like trying to read a legal document. I had to do it for the 10 years I was on the ANSI X3H2 committee and somehow I kept up the habit after I left. My wife is right I don't get out enough.
>> in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) <<
In a word, COBOL! Since this was the first business language was widely used, people tended to move old COBOL systems into SQL. Since COBOL keeps everything in string pictures, you find crap like dates and times coded in strings, things that should be numeric because are used for calculations also encoded as strings, etc. SQL has a wide range of available data types and we went out of our way to see that are well data types matched into all of the major mainframe data types in the 1980s. It's also amazing how the legacy of punch cards still lives in the 21st century.
I noticed another thing that you get the same data element different names when it comes from different tables. No, a data element I have one and only one name throughout the entire schema. Ideally, a data element you have one and only one well-understood name in the entire universe; this is why a VIN has that name and definition in every insurance company, every car manufacturer, etc. on earth.

Enough bashing of COBOL and those who developed in COBOL. You obviously hate the language considering how much you detest those of us who made a living working with it. Please, come into the 21st century. Many modern languages have many adherents who abuse the storing of values in data types that are inappropriate.

insomnius Old Hand Points: 335 More actions · Answer 12

Apologies, just in a small attempt to bring this back to the question at hand, I have been experimenting with something this afternoon as I had a thought, but i'm not sure if this is the right way to go about it:

The table which contains the extra data is set up like this:

dbo.scans
ID (INT, primary key, identity, no nulls)
JobNo (Varchar(20), no nulls)
Dept (nVarchar(20), no nulls)
ScanDate (datetime2) no nulls)

When I use:

SELECT * FROM scans
ORDER BY JobNo

I get the following results:

IDJobNoDeptScanDate
300000115717Office2019-08-24 22:51:00.0000000
310000115717Office2019-08-24 22:52:00.0000000
330000115717CAD2019-08-25 00:13:44.0000000
370000115717Despatch2019-08-26 16:30:26.0000000
380000115717Despatch2019-08-26 16:31:29.0000000
340000115847CAD2019-08-25 10:44:14.0000000
350000115847Laser2019-08-25 13:09:47.0000000
390000115847Eyeletting2019-08-26 16:36:09.0000000
400000115847Spirals2019-08-26 16:40:05.0000000
360000119378Machining2019-08-25 14:57:54.0000000
320000126563Office2019-08-25 00:12:30.0000000
290000127652Inspection2019-08-22 16:06:49.0000000
270000127754Office2019-08-22 15:50:51.0000000
280000127754Office2019-08-22 15:58:35.0000000

But what I am after is this:

IDJobNoDeptScanDate
380000115717Despatch2019-08-26 16:31:29.0000000
400000115847Spirals2019-08-26 16:40:05.0000000
360000119378Machining2019-08-25 14:57:54.0000000
320000126563Office2019-08-25 00:12:30.0000000
290000127652Inspection2019-08-22 16:06:49.0000000
280000127754Office2019-08-22 15:58:35.0000000

Now, when I try:

SELECT x.JobNo AS JobRef, max(x.ScanDate) AS Scanned
FROM Scans x
GROUP BY x.JobNo

I get the right limitation to the max ScanDate value per JobRef but I am missing the Department:

JobRefScanned
00001157172019-08-26 16:31:29.0000000
00001158472019-08-26 16:40:05.0000000
00001193782019-08-25 14:57:54.0000000
00001265632019-08-25 00:12:30.0000000
00001276522019-08-22 16:06:49.0000000
00001277542019-08-22 15:58:35.0000000

So I figured, maybe I have to do this in two steps, so I used an attempt at a sub-query method above and tried this:

SELECT JobNo, Dept, ScanDate
FROM Scans y
RIGHT OUTER JOIN (
SELECT x.JobNo as JobRef, max(x.ScanDate) AS Scanned
FROM Scans x
GROUP BY x.JobNo
) x
ON x.Scanned = y.ScanDate
ORDER BY JobNo

And this seemed to do 'the?' trick.

JobNoDeptScanDate
0000115717Despatch2019-08-26 16:31:29.0000000
0000115847Spirals2019-08-26 16:40:05.0000000
0000119378Machining2019-08-25 14:57:54.0000000
0000126563Office2019-08-25 00:12:30.0000000
0000127652Inspection2019-08-22 16:06:49.0000000
0000127754Office2019-08-22 15:58:35.0000000

Now this is where I hit my limit tonight and ask the experts opinion on how best to proceed. I am positive there is a more elegant solution to what I am trying to achieve, I'm happy I got this far with it but there must be a better way to do it.

The thing is, I actually need to get this tied to another table which will be relational by the JobNo on another table (SOPOrdrReturns mentioned in previous posts).

As always, any advice, hints or tips would be great. (On day 2 of trying to figure this out...)

This reply was modified 6 years, 11 months ago by insomnius.

frederico_fonseca SSCoach Points: 16338 More actions · Answer 13

that code you just posted, asking the "experts" if it is a good option is what I gave you on the first reply to you.

yes it is one of the ways to do it - not the only way as I pointed out, neither the best or worst as this depends, as I mentioned, on other factors like volumes, indexes, machine spec, SQL instance definitions.

it should be a left outer join, not a right outer based on your original question.

Based on your current code, and as you are joining the table to itself (aggregated) then a inner join may perform better and is on this particular case correct

insomnius Old Hand Points: 335 More actions · Answer 14

Hi Frederico, thank you for your kind input to my question. Actually, I tried both of your options but I was missing the Dept each time which is what I needed.

After thinking and scratching my head for a while, I wondered if I could use the way you had suggested in your answer, and get that much needed Dept name and managed it with the above.

My question was whether or not what I arrived at was the best way based on the information I had provided.

Based on your message here, I wonder if I am missing something, does your way bring in the Dept and I am just not seeing it?