Issue with max() and aggregate, missunderstanding

  • insomnius

    Old Hand

    Points: 335

    Hi all, this is my first post here and I've been looking and trying to understand this for hours the various ways people have tried to get around this issue.

    My trouble is as follows. I have a database with three tables linked together by 'DocumentNo'. Two main linked tables which output all rows with a full outer join. These records are all unique and are never the same.

    The third table is a child table if you will, and this has many rows of it's own, made up of 'Dept' and 'ScanDate'.

    What I need to do, is output unique rows showing only the latest 'scan date' for each unique 'DocumentNo'. I have tried max([ScanDate]) with GROUP BY [DocumentNo] but i'm fairly sure you can guess the error message I am receiving.

    Are there any hinters, or pointers out there that might help with this one, and it would be most appreciated if it could be done in a way that I can get my eureka moment and understand what is going on...

     

    SELECT TOP (1000)
    [Priority]
    ,[PromisedDeliveryDate]
    ,[DocumentNo]
    ,[CustomerAccountNumber]
    ,[CustomerDocumentNo]
    ,[RequestedDeliveryDate]
    ,[AnalysisCode3]
    ,[InvoiceCreditStatusID]
    ,[AccountIsOnHold]
    ,[CancelledStatusID]
    ,[DocumentStatusID]
    ,[CustomerAccountName]
    ,[DocumentTypeID]
    ,[DespatchReceiptStatusID]
    ,[Dept]
    ,[ScanDate]
    FROM [mydb].[dbo].[myview]

    • This topic was modified 1 month, 3 weeks ago by  insomnius.
  • frederico_fonseca

    SSChampion

    Points: 14173

    get used to post full ddl and sample data - hard to give correct help without it.

     

    2 possible ways to do it (more exist) - which one is better depends on volumes of underlying tables

    -- using outer apply 
    select *
    from mastertable mt
    outer apply (select max(ScanDate) as ScanDate
    from childtable ct
    where ct.document = mt.document
    ) ct

    -- using subquery
    select *
    from mastertable mt
    left outer join (select ct.document, max(ScanDate) as ScanDate
    from childtable ct
    group by ct.document
    ) ct
    on ct.document = mt.document

     

  • insomnius

    Old Hand

    Points: 335

    Hi, I can see what you have done there, from the look of it, it is breaking up the 1st/2nd table from the third table, applying the max value to that and joining the remaining row to the 1st/2nd table? (if i'm wrong please tell me...)

    What I am trying to get to, is to have the information in a view that a web page will look at as a single source of information, but I have tried using your code to get a separate view, and SQL Server doesn't like it.

  • pietlinden

    SSC Guru

    Points: 62456

    So you just slap a CREATE VIEW on the front and list the columns you want to return. Only catch is that you can't use ORDER BY in a view.

    CREATE VIEW MyView
    AS
    select <field list>
    from mastertable mt
    outer apply (select max(ScanDate) as ScanDate
                 from childtable ct
                 where ct.document = mt.document
                 ) ct
  • jcelko212 32090

    SSCrazy Eights

    Points: 8854

    >> I have a database with three tables linked [sic] together by 'DocumentNo'. Two main linked [sic] tables which output all rows with a full outer join. These records [sic] are all unique and are never the same. <<

    You're obviously coming to SQL for background in older filesystems or pre-relational databases. The term "link" refers to pointer chains that existed in a network and hierarchical databases; SQL and RDBMS use references which are far more abstract.

    >> The third table is a child [sic] table if you will, and this has many rows of it's own, made up of 'Dept' and 'ScanDate'. <<

    The term "child" and "parent" are also inherited from the old network databases. We have referenced and referencing tables and RDBMS.

    >> What I need to do is output unique rows showing only the latest 'scan date' for each unique 'DocumentNo'. I have tried max([ScanDate]) with GROUP BY [DocumentNo] but I'm fairly sure you can guess the error message I am receiving. <<

    Guess your error messages? With a complete lack of information you posted here, we can even guess what your tables look like! For over 30 years posting DDL has been standard netiquette on SQL forums. If you want to get help can you please play by the rules?

    The data element name you posted in your sample query is also disasters. I don't know if you're old enough to remember why we used to put commas at the front of each punchcard, but it let us rearrange the deck of cards and reuse them easily. But it makes maintenance and reading the code hard -- about 8 to 12% harder. For the last 20 or 30 years, programmers have had pretty printers (code formatting tools) that require the push of a button to arrange them properly. Postfix is like "_nbr" , "_date", "_code" etc are called attribute properties. You can look at ISO standard 11179 or the metadata committee guides.

    These fundamental design rules tell us that crap like "dispatch_receipt_status_id" is a bad data element name that has no business in the schema. Either the data element is a status of some kind or it is an identifier, but it is never both. This is like putting a string of adjectives on the noun in English. One of my favorite T-shirts has the slogan "on a scale from 1 to 10, what color is your favorite letter of the alphabet?" Because it demonstrates mixed datatypes and metadata confusion all in one phrase. But when people read it, they actually try to answer it!

    Want to follow the rules and try again?

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • insomnius

    Old Hand

    Points: 335

    jcelko212 32090 wrote:

    Want to follow the rules and try again?

    Thank you for using a giant sized hand to slap this new fellow down.

    I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. You're looking on a forum for answers or at least a pointer in the right direction. I am quite happy to look myself when I know what question to ask, but alas, I get stuck every now and then and I had a thought that a forum such as this one might be the best place to start asking questions.

    I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that.

    I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification.

    The program the office works with is Sage 200c On-Premise. The data is sat on MS Server 2016 hosting SQL Server 2016.

    As for presuming that the professionals here will probably know the error message I am receiving, you're right I shouldn't presume.

    The error is as follows:

    Column 'Sage200_mydb.dbo.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    Now, I have done a little digging into this, but from a very limited understanding, you can either make the other columns a max, min, etc or add the column name into the group by clause. Another post suggested that this was the wrong way to do this. Again, a lack of understanding on my part and wanting to understand how to use this Max and Group By properly means I wanted to get the best help and pointers possible so I don't spend another 24 hours scratching my head.

    The code I posted in my first message was a copy/paste straight from SQL Server showing me the top 1000 rows for the view I mentioned in my earlier post. If it was wrong for SQL Server to provide me with the query in that format with the commas on each new line, then I will adjust it around next time I post code.

    I fully apologize for my lack of awareness on the Netiquette of SQL Forums, that has been around for 30 years. This was my first ever post in this forum, actually in any forum on SQL. I will read up on what you are expecting from me in future.

    On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help?

    The current View (copied from the 'design') is as follows:

    SELECT        TOP (100) PERCENT dbo.SOPOrderReturn.Priority, dbo.SOPOrderReturn.PromisedDeliveryDate, dbo.SOPOrderReturn.DocumentNo, dbo.SLCustomerAccount.CustomerAccountNumber, 
    dbo.SOPOrderReturn.CustomerDocumentNo, dbo.SOPOrderReturn.RequestedDeliveryDate, dbo.SOPOrderReturn.AnalysisCode3, dbo.SOPOrderReturn.InvoiceCreditStatusID,
    dbo.SLCustomerAccount.AccountIsOnHold, dbo.SOPOrderReturn.CancelledStatusID, dbo.SOPOrderReturn.DocumentStatusID, dbo.SLCustomerAccount.CustomerAccountName,
    dbo.SOPOrderReturn.DocumentTypeID, dbo.SOPOrderReturn.DespatchReceiptStatusID, dbo.Scans.Dept, dbo.Scans.ScanDate
    FROM dbo.SOPOrderReturn INNER JOIN
    dbo.SLCustomerAccount ON dbo.SOPOrderReturn.CustomerID = dbo.SLCustomerAccount.SLCustomerAccountID FULL OUTER JOIN
    dbo.Scans ON dbo.SOPOrderReturn.DocumentNo = dbo.Scans.JobNo
    WHERE (dbo.SOPOrderReturn.InvoiceCreditStatusID = 0) AND (NOT (dbo.SOPOrderReturn.DocumentStatusID = '2')) AND (dbo.SOPOrderReturn.DocumentTypeID = '0' OR
    dbo.SOPOrderReturn.DocumentTypeID = '2') AND (NOT (dbo.SOPOrderReturn.DespatchReceiptStatusID = 2))
    ORDER BY dbo.SOPOrderReturn.Priority DESC, dbo.SOPOrderReturn.PromisedDeliveryDate

    Sample output of the query is as follows: (edited as this holds actual customer information)

    Priority	PromisedDeliveryDate	DocumentNo	CustomerAccountNumber	CustomerDocumentNo	RequestedDeliveryDate		AnalysisCode3	InvoiceCreditStatusID	AccountIsOnHold	CancelledStatusID	DocumentStatusID	CustomerAccountName	DocumentTypeID	DespatchReceiptStatusID	Dept	ScanDate
    2020-03-10 00:00:00.000 0000115717 ACC003 0000000001 2020-03-10 00:00:00.000 0 0 0 0 Cust 1 0 0 Office 2019-08-24 22:51:00.0000000
    2020-03-10 00:00:00.000 0000115717 ACC003 0000000001 2020-03-10 00:00:00.000 0 0 0 0 Cust 1 0 0 Office 2019-08-24 22:52:00.0000000
    2020-03-10 00:00:00.000 0000115717 ACC003 0000000001 2020-03-10 00:00:00.000 0 0 0 0 Cust 1 0 0 CAD 2019-08-25 00:13:44.0000000
    2020-03-03 00:00:00.000 0000115847 ACC003 0000000002 2020-03-03 00:00:00.000 0 0 0 0 Cust 1 0 0 CAD 2019-08-25 10:44:14.0000000
    2020-03-03 00:00:00.000 0000115847 ACC003 0000000002 2020-03-03 00:00:00.000 0 0 0 0 Cust 1 0 0 Laser 2019-08-25 13:09:47.0000000

    I hope I have provided enough information now to get me started?

    in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) under DocumentNo, this is the main column for the whole thing I am working on. I am wanting to show single DocumentNo that refers to the latest ScanDate. This is why my initial thought was to go straight to Max([ScanDate]) and then Group By [DocumentNo]. In my mind it made sense, but of course I am missing something. If I add [Dept] into the Group By, I get all the columns anyway so was unsure how to proceed further. If I leave it out, I get the error mentioned above.

    Kindest regards

    • This reply was modified 1 month, 3 weeks ago by  insomnius.
    • This reply was modified 1 month, 3 weeks ago by  insomnius.
    • This reply was modified 1 month, 3 weeks ago by  insomnius.
    • This reply was modified 1 month, 3 weeks ago by  insomnius.
    • This reply was modified 1 month, 3 weeks ago by  insomnius.
  • Lynn Pettis

    SSC Guru

    Points: 442144

    insomnius wrote:

    jcelko212 32090 wrote:

    >> I have a database with three tables linked [sic] together by 'DocumentNo'. Two main linked [sic] tables which output all rows with a full outer join. These records [sic] are all unique and are never the same. <<

    You're obviously coming to SQL for background in older filesystems or pre-relational databases. The term "link" refers to pointer chains that existed in a network and hierarchical databases; SQL and RDBMS use references which are far more abstract.

    >> The third table is a child [sic] table if you will, and this has many rows of it's own, made up of 'Dept' and 'ScanDate'. <<

    The term "child" and "parent" are also inherited from the old network databases. We have referenced and referencing tables and RDBMS.

    >> What I need to do is output unique rows showing only the latest 'scan date' for each unique 'DocumentNo'. I have tried max([ScanDate]) with GROUP BY [DocumentNo] but I'm fairly sure you can guess the error message I am receiving. <<

    Guess your error messages? With a complete lack of information you posted here, we can even guess what your tables look like! For over 30 years posting DDL has been standard netiquette on SQL forums. If you want to get help can you please play by the rules?

    The data element name you posted in your sample query is also disasters. I don't know if you're old enough to remember why we used to put commas at the front of each punchcard, but it let us rearrange the deck of cards and reuse them easily. But it makes maintenance and reading the code hard -- about 8 to 12% harder. For the last 20 or 30 years, programmers have had pretty printers (code formatting tools) that require the push of a button to arrange them properly. Postfix is like "_nbr" , "_date", "_code" etc are called attribute properties. You can look at ISO standard 11179 or the metadata committee guides.

    These fundamental design rules tell us that crap like "dispatch_receipt_status_id" is a bad data element name that has no business in the schema. Either the data element is a status of some kind or it is an identifier, but it is never both. This is like putting a string of adjectives on the noun in English. One of my favorite T-shirts has the slogan "on a scale from 1 to 10, what color is your favorite letter of the alphabet?" Because it demonstrates mixed datatypes and metadata confusion all in one phrase. But when people read it, they actually try to answer it!

    Want to follow the rules and try again?

    Thank you for using a giant sized hand to slap this new fellow down.

    I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. You're looking on a forum for answers or at least a pointer in the right direction. I am quite happy to look myself when I know what question to ask, but alas, I get stuck every now and then and I had a thought that a forum such as this one might be the best place to start asking questions.

    I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that.

    I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification.

    The program the office works with is Sage 200c On-Premise. The data is sat on MS Server 2016 hosting SQL Server 2016.

    As for presuming that the professionals here will probably know the error message I am receiving, you're right I shouldn't presume.

    The error is as follows:

    Column 'Sage200_mydb.dbo.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    Now, I have done a little digging into this, but from a very limited understanding, you can either make the other columns a max, min, etc or add the column name into the group by clause. Another post suggested that this was the wrong way to do this. Again, a lack of understanding on my part and wanting to understand how to use this Max and Group By properly means I wanted to get the best help and pointers possible so I don't spend another 24 hours scratching my head.

    The code I posted in my first message was a copy/paste straight from SQL Server showing me the top 1000 rows for the view I mentioned in my earlier post. If it was wrong for SQL Server to provide me with the query in that format with the commas on each new line, then I will adjust it around next time I post code.

    I fully apologize for my lack of awareness on the Netiquette of SQL Forums, that has been around for 30 years. This was my first ever post in this forum, actually in any forum on SQL. I will read up on what you are expecting from me in future.

    On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help?

    The current View (copied from the 'design') is as follows:

    SELECT        TOP (100) PERCENT dbo.SOPOrderReturn.Priority, dbo.SOPOrderReturn.PromisedDeliveryDate, dbo.SOPOrderReturn.DocumentNo, dbo.SLCustomerAccount.CustomerAccountNumber, 
    dbo.SOPOrderReturn.CustomerDocumentNo, dbo.SOPOrderReturn.RequestedDeliveryDate, dbo.SOPOrderReturn.AnalysisCode3, dbo.SOPOrderReturn.InvoiceCreditStatusID,
    dbo.SLCustomerAccount.AccountIsOnHold, dbo.SOPOrderReturn.CancelledStatusID, dbo.SOPOrderReturn.DocumentStatusID, dbo.SLCustomerAccount.CustomerAccountName,
    dbo.SOPOrderReturn.DocumentTypeID, dbo.SOPOrderReturn.DespatchReceiptStatusID, dbo.Scans.Dept, dbo.Scans.ScanDate
    FROM dbo.SOPOrderReturn INNER JOIN
    dbo.SLCustomerAccount ON dbo.SOPOrderReturn.CustomerID = dbo.SLCustomerAccount.SLCustomerAccountID FULL OUTER JOIN
    dbo.Scans ON dbo.SOPOrderReturn.DocumentNo = dbo.Scans.JobNo
    WHERE (dbo.SOPOrderReturn.InvoiceCreditStatusID = 0) AND (NOT (dbo.SOPOrderReturn.DocumentStatusID = '2')) AND (dbo.SOPOrderReturn.DocumentTypeID = '0' OR
    dbo.SOPOrderReturn.DocumentTypeID = '2') AND (NOT (dbo.SOPOrderReturn.DespatchReceiptStatusID = 2))
    ORDER BY dbo.SOPOrderReturn.Priority DESC, dbo.SOPOrderReturn.PromisedDeliveryDate

    Sample output of the query is as follows: (edited as this holds actual customer information)

    Priority	PromisedDeliveryDate	DocumentNo	CustomerAccountNumber	CustomerDocumentNo	RequestedDeliveryDate		AnalysisCode3	InvoiceCreditStatusID	AccountIsOnHold	CancelledStatusID	DocumentStatusID	CustomerAccountName	DocumentTypeID	DespatchReceiptStatusID	Dept	ScanDate
    2020-03-10 00:00:00.000 0000115717 ACC003 0000000001 2020-03-10 00:00:00.000 0 0 0 0 Cust 1 0 0 Office 2019-08-24 22:51:00.0000000
    2020-03-10 00:00:00.000 0000115717 ACC003 0000000001 2020-03-10 00:00:00.000 0 0 0 0 Cust 1 0 0 Office 2019-08-24 22:52:00.0000000
    2020-03-10 00:00:00.000 0000115717 ACC003 0000000001 2020-03-10 00:00:00.000 0 0 0 0 Cust 1 0 0 CAD 2019-08-25 00:13:44.0000000
    2020-03-03 00:00:00.000 0000115847 ACC003 0000000002 2020-03-03 00:00:00.000 0 0 0 0 Cust 1 0 0 CAD 2019-08-25 10:44:14.0000000
    2020-03-03 00:00:00.000 0000115847 ACC003 0000000002 2020-03-03 00:00:00.000 0 0 0 0 Cust 1 0 0 Laser 2019-08-25 13:09:47.0000000

    I hope I have provided enough information now to get me started?

    in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) under DocumentNo, this is the main column for the whole thing I am working on. I am wanting to show single DocumentNo that refers to the latest ScanDate. This is why my initial thought was to go straight to Max([ScanDate]) and then Group By [DocumentNo]. In my mind it made sense, but of course I am missing something. If I add [Dept] into the Group By, I get all the columns anyway so was unsure how to proceed further. If I leave it out, I get the error mentioned above.

    Kindest regards

     

    I haven't read your entire reply, as it is quite long, but I did read the opening.  When it comes to Mr. Celko, ignore him.  He believes that bashing people for using the wrong terminology,  blaming people for the inaccuracies with which they are forced to work with (inherited designs), etc. is the best way to teach people.  He has absolutely no tact and has even been know to post code that doesn't work.

  • insomnius

    Old Hand

    Points: 335

    I appreciate the response Lynn. I have to admit, I knew coming in that I wasn't best equipped with enough knowledge to be able to outright ask for what I wanted, I was expecting a little resistance.

    I 'believe' I have posted what was asked for, if someone would kindly have a look at me and nudge me in the right direction that would be very much appreciated.

    Kindest regards

    Inso

  • jcelko212 32090

    SSCrazy Eights

    Points: 8854

    >> I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. <<

    Before you can ask any kind of question, you have to have some basics. And very often the answer is you need to go to school for a long time.

    >> I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that. <<

    Are you familiar with the Kruger-Dunning effect? 🙂 Trying to learn by trial and error as opposed to actually getting an education on basics is very often not a good idea.

    >> I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification. <<

    I was at Comdex at the Microsoft show where they premiered access so long ago. I VIEW it as a total disaster. It's not really SQL in the first release of it sorted dates alphabetically instead of temporally. Ugh!

    >> Column 'Sage200_mydb.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. <<

    This has to do with the scoping rules of the language. But we need to se DDL.

    >> On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help? <<

    DDL stands for "Data Declaration Language" and it is one of three sub-languages in SQL. It's the stuff that creates schema objects, like tables, VIEWs, triggers, sequences, etc. Since SQL is declarative, 80 to 90% of the work is done in the DDL. The other two sub-languages are DCL (data control language, which does the security and access – – think grant and revoke privileges) and DML (data manipulation language) which is where we do queries with select, change the contents of the tables we declared in the DDL with insert update and delete statements, etc.

    >> The current VIEW (copied from the 'design') is as follows: <<

    Don't ever put in ORDER BY clause on a VIEW. It's meaningless because a table has no order. Ordering makes sense only when you convert a table into a cursor and a view is always a virtual table. The optimizer just throws it out but looked sort of silly. I think I already criticized the violations of naming rules and basic data modeling in the base tables.

    I will be honest about this. I am one of the few people who actually sits down and reads ANSI/ISO standards. They are incredibly dull, and you need to know the language its kind of like trying to read a legal document. I had to do it for the 10 years I was on the ANSI X3H2 committee and somehow I kept up the habit after I left. My wife is right I don't get out enough.

    >> in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) <<

    In a word, COBOL! Since this was the first business language was widely used, people tended to move old COBOL systems into SQL. Since COBOL keeps everything in string pictures, you find crap like dates and times coded in strings, things that should be numeric because are used for calculations also encoded as strings, etc. SQL has a wide range of available data types and we went out of our way to see that are well data types matched into all of the major mainframe data types in the 1980s. It's also amazing how the legacy of punch cards still lives in the 21st century.

    I noticed another thing that you get the same data element different names when it comes from different tables. No, a data element I have one and only one name throughout the entire schema. Ideally, a data element you have one and only one well-understood name in the entire universe; this is why a VIN has that name and definition in every insurance company, every car manufacturer, etc. on earth.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • drew.allen

    SSC Guru

    Points: 76595

    jcelko212 32090 wrote:

    I will be honest about this. I am one of the few people who actually sits down and reads ANSI/ISO standards. They are incredibly dull, and you need to know the language its kind of like trying to read a legal document. I had to do it for the 10 years I was on the ANSI X3H2 committee and somehow I kept up the habit after I left. My wife is right I don't get out enough.

    People don't read them, because they charge an exorbitant amount of money to get them.  I, for one, would read them if they were available for free online.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • jcelko212 32090

    SSCrazy Eights

    Points: 8854

    People don't read them, because they charge an exorbitant amount of money to get them. I, for one, would read them if they were available for free online.

    The problems have always been there two finance models in the standards world. ISO get ll of its income from selling copies of the standards so it does a lot to protect them. ANSI gets its income from assessing membership fees and has wanted to do exactly what you said – put them online for free.

    Then we get into national standards groups who have their own agendas. JIS (Japanese industrial standards) would like to have Japanese language documents, but they're polite about it. The French, on the other hand, hate anything that has to do with the English language. They've actually blocked standards on this issue. They failed to admit that French is a slowly dying language (fewer native and secondary speakers every year). But it still required on the ISO website. Did you ever wonder why we call the organization ISO when the English language name for it is "International Organization for Standards"? Likewise, we have UTC which is supposed to be "universal coordinated time" in English?

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Lynn Pettis

    SSC Guru

    Points: 442144

    jcelko212 32090 wrote:

    >> I have sadly run into the issue where I don't actually know the question to ask to get the answer I need or to be able to formulate a proper search offered on a forum. It is also the reason I despise blanket statements such as 'ask Google'. <<

    Before you can ask any kind of question, you have to have some basics. And very often the answer is you need to go to school for a long time.

    >> I have inherited a project which I want to get my hands dirty on. I don't want anyone to do the job for me, which is why when I asked my original question, I stated I wanted to understand what was going on. Willingness to learn and all that. <<

    Are you familiar with the Kruger-Dunning effect? 🙂 Trying to learn by trial and error as opposed to actually getting an education on basics is very often not a good idea.

    >> I have worked a little bit with MS Access. That is all. My terminology is limited because I inherited that workload too, so have bit by bit learned a little more to finish the task. I can't even comment on whether or not that task was completed to any particular standard or to a specification. <<

    I was at Comdex at the Microsoft show where they premiered access so long ago. I VIEW it as a total disaster. It's not really SQL in the first release of it sorted dates alphabetically instead of temporally. Ugh!

    >> Column 'Sage200_mydb.ScansVw.Priority' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. <<

    This has to do with the scoping rules of the language. But we need to se DDL.

    >> On that note, could you tell me please what DDL is and where I might find the ANSI/ISO specification for asking for help? <<

    DDL stands for "Data Declaration Language" and it is one of three sub-languages in SQL. It's the stuff that creates schema objects, like tables, VIEWs, triggers, sequences, etc. Since SQL is declarative, 80 to 90% of the work is done in the DDL. The other two sub-languages are DCL (data control language, which does the security and access – – think grant and revoke privileges) and DML (data manipulation language) which is where we do queries with select, change the contents of the tables we declared in the DDL with insert update and delete statements, etc.

    >> The current VIEW (copied from the 'design') is as follows: <<

    Don't ever put in ORDER BY clause on a VIEW. It's meaningless because a table has no order. Ordering makes sense only when you convert a table into a cursor and a view is always a virtual table. The optimizer just throws it out but looked sort of silly. I think I already criticized the violations of naming rules and basic data modeling in the base tables.

    I will be honest about this. I am one of the few people who actually sits down and reads ANSI/ISO standards. They are incredibly dull, and you need to know the language its kind of like trying to read a legal document. I had to do it for the 10 years I was on the ANSI X3H2 committee and somehow I kept up the habit after I left. My wife is right I don't get out enough.

    >> in the results section above, I have two unique numbers (which are actually stored as text in the database, I don't know why Sage chose to go this route) <<

    In a word, COBOL! Since this was the first business language was widely used, people tended to move old COBOL systems into SQL. Since COBOL keeps everything in string pictures, you find crap like dates and times coded in strings, things that should be numeric because are used for calculations also encoded as strings, etc. SQL has a wide range of available data types and we went out of our way to see that are well data types matched into all of the major mainframe data types in the 1980s. It's also amazing how the legacy of punch cards still lives in the 21st century.

    I noticed another thing that you get the same data element different names when it comes from different tables. No, a data element I have one and only one name throughout the entire schema. Ideally, a data element you have one and only one well-understood name in the entire universe; this is why a VIN has that name and definition in every insurance company, every car manufacturer, etc. on earth.

     

    Enough bashing of COBOL and those who developed in COBOL.  You obviously hate the language considering how much you detest those of us who made a living working with it.  Please, come into the 21st century.  Many modern languages have many adherents who abuse the storing of values in data types that are inappropriate.

     

  • insomnius

    Old Hand

    Points: 335

    Apologies, just in a small attempt to bring this back to the question at hand, I have been experimenting with something this afternoon as I had a thought, but i'm not sure if this is the right way to go about it:

    The table which contains the extra data is set up like this:

    dbo.scans
    ID (INT, primary key, identity, no nulls)
    JobNo (Varchar(20), no nulls)
    Dept (nVarchar(20), no nulls)
    ScanDate (datetime2) no nulls)

    When I use:

    SELECT * FROM scans
    ORDER BY JobNo

    I get the following results:

    ID	JobNo		Dept		ScanDate
    30 0000115717 Office 2019-08-24 22:51:00.0000000
    31 0000115717 Office 2019-08-24 22:52:00.0000000
    33 0000115717 CAD 2019-08-25 00:13:44.0000000
    37 0000115717 Despatch 2019-08-26 16:30:26.0000000
    38 0000115717 Despatch 2019-08-26 16:31:29.0000000
    34 0000115847 CAD 2019-08-25 10:44:14.0000000
    35 0000115847 Laser 2019-08-25 13:09:47.0000000
    39 0000115847 Eyeletting 2019-08-26 16:36:09.0000000
    40 0000115847 Spirals 2019-08-26 16:40:05.0000000
    36 0000119378 Machining 2019-08-25 14:57:54.0000000
    32 0000126563 Office 2019-08-25 00:12:30.0000000
    29 0000127652 Inspection 2019-08-22 16:06:49.0000000
    27 0000127754 Office 2019-08-22 15:50:51.0000000
    28 0000127754 Office 2019-08-22 15:58:35.0000000

    But what I am after is this:

    ID	JobNo		Dept		ScanDate
    38 0000115717 Despatch 2019-08-26 16:31:29.0000000
    40 0000115847 Spirals 2019-08-26 16:40:05.0000000
    36 0000119378 Machining 2019-08-25 14:57:54.0000000
    32 0000126563 Office 2019-08-25 00:12:30.0000000
    29 0000127652 Inspection 2019-08-22 16:06:49.0000000
    28 0000127754 Office 2019-08-22 15:58:35.0000000

    Now, when I try:

    SELECT x.JobNo AS JobRef, max(x.ScanDate) AS Scanned
    FROM Scans x
    GROUP BY x.JobNo

    I get the right limitation to the max ScanDate value per JobRef but I am missing the Department:

    JobRef		Scanned
    0000115717 2019-08-26 16:31:29.0000000
    0000115847 2019-08-26 16:40:05.0000000
    0000119378 2019-08-25 14:57:54.0000000
    0000126563 2019-08-25 00:12:30.0000000
    0000127652 2019-08-22 16:06:49.0000000
    0000127754 2019-08-22 15:58:35.0000000

    So I figured, maybe I have to do this in two steps, so I used an attempt at a sub-query method above and tried this:

    SELECT JobNo, Dept, ScanDate
    FROM Scans y
    RIGHT OUTER JOIN (
    SELECT x.JobNo as JobRef, max(x.ScanDate) AS Scanned
    FROM Scans x
    GROUP BY x.JobNo
    ) x
    ON x.Scanned = y.ScanDate
    ORDER BY JobNo

    And this seemed to do 'the?' trick.

    JobNo		Dept		ScanDate
    0000115717 Despatch 2019-08-26 16:31:29.0000000
    0000115847 Spirals 2019-08-26 16:40:05.0000000
    0000119378 Machining 2019-08-25 14:57:54.0000000
    0000126563 Office 2019-08-25 00:12:30.0000000
    0000127652 Inspection 2019-08-22 16:06:49.0000000
    0000127754 Office 2019-08-22 15:58:35.0000000

    Now this is where I hit my limit tonight and ask the experts opinion on how best to proceed. I am positive there is a more elegant solution to what I am trying to achieve, I'm happy I got this far with it but there must be a better way to do it.

    The thing is, I actually need to get this tied to another table which will be relational by the JobNo on another table (SOPOrdrReturns mentioned in previous posts).

    As always, any advice, hints or tips would be great. (On day 2 of trying to figure this out...)

    • This reply was modified 1 month, 3 weeks ago by  insomnius.
  • frederico_fonseca

    SSChampion

    Points: 14173

    that code you just posted, asking the "experts" if it is a good option is what I gave you on the first reply to you.

    yes it is one of the ways to do it - not the only way as I pointed out, neither the best or worst as this depends, as I mentioned, on other factors like volumes, indexes, machine spec, SQL instance definitions.

    it should be a left outer join, not a right outer based on your original question.

    Based on your current code, and as you are joining the table to itself (aggregated) then a inner join may perform better and is on this particular case correct

     

  • insomnius

    Old Hand

    Points: 335

    Hi Frederico, thank you for your kind input to my question. Actually, I tried both of your options but I was missing the Dept each time which is what I needed.

    After thinking and scratching my head for a while, I wondered if I could use the way you had suggested in your answer, and get that much needed Dept name and managed it with the above.

    My question was whether or not what I arrived at was the best way based on the information I had provided.

    Based on your message here, I wonder if I am missing something, does your way bring in the Dept and I am just not seeing it?

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply