Simple GROUP BY Query

  • Hello:

    I have this query.  I am trying to understand why I have to GROUP by all of the fields in my SELECT Statement.  Also, the query is returning the sum for all of the jmoJobID fields, when I want this to be summed by the jmaPartID field.

    SELECT DISTINCT JobOperations.jmoJobID, JobAssemblies.jmaPartID, JobOperations.jmoWorkCenterID, JobOperations.jmoProcessID, JobOperations.jmoOperationQuantity, SUM(JobOperations.jmoCompletedProductionHours) AS SUM_CompletedProductionHours

    FROM JobOperations INNER JOIN

    JobAssemblies ON JobOperations.jmoJobID = JobAssemblies.jmaJobID

    WHERE jmoJobID = '201739-01'

    GROUP BY JobOperations.jmoJobID, JobOperations.jmoWorkCenterID, JobAssemblies.jmaPartID, JobOperations.jmoProcessID, JobOperations.jmoOperationQuantity

    ORDER BY JobOperations.jmoWorkCenterID

     

    Thank you!

    Steve Anderson

  • Here's my best guess at what you're trying to do.

    SELECT 
    JO.jmoJobID, JA.jmaPartID, JO.jmoWorkCenterID,
    JO.jmoProcessID, JO.jmoOperationQuantity,
    JA.SUM_CompletedProductionHours

    FROM JobOperations JO INNER JOIN

    (
    SELECT jmaJobID, jmaPartID, SUM(jmaCompletedProductionHours) AS SUM_CompletedProductionHours
    FROM JobAssemblies
    GROUP BY jmaJobID, jmaPartID
    ) AS JA ON
    JO.jmoJobID = JA.jmaJobID

    WHERE jmoJobID = '201739-01'

    ORDER BY JO.jmoWorkCenterID

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • It is because of your aggregate function (SUM).  Once you put in an aggregate function, you NEED to group by all non-aggregate columns that are in use.  Otherwise SQL has no way to "group" the data to be used by the SUM in any meaningful manner.

    Now the fun trick comes in when you want to SUM but NOT group by all of the columns (as is your case) - that is where Windowing functions come in.  So your SUM statement would be something like:

    SUM(JobOperations.jmoCompletedProductionHours) OVER (PARTITION BY jmoJobID) AS SUM_CompletedProductionHours

    Then remove the GROUP BY clause completely.

    The "OVER" keyword tells SQL "this is a windowing function".  The PARTITION BY clause tells SQL "I want to group by this/these column(s)".  In this case, it would group it by jmoJobID and ONLY jmoJobID.  You can add "ORDER BY" to a windowing function as well, but that clause is useless in a SUM operation.  Now, if you were doing ROW_NUMBER() (for example), then the ORDER BY would be helpful; but SUM and COUNT don't benefit from the data being in any specific order.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks for your reply.

    I actually tried this.  It is summing the hours not by the PartID in the Job, but by the job, so it appears.  Every row for the last column has the same value.

    SELECT DISTINCT JobOperations.jmoJobID, JobAssemblies.jmaPartID, JobOperations.jmoWorkCenterID, JobOperations.jmoProcessID, JobOperations.jmoOperationQuantity, SUM(JobOperations.jmoCompletedProductionHours) OVER (PARTITION BY JobAssemblies.jmaPartID) AS SUM_CompletedProductionHours

    FROM JobOperations INNER JOIN

    JobAssemblies ON JobOperations.jmoJobID = JobAssemblies.jmaJobID

    WHERE jmoJobID = '201739-01'

    ORDER BY JobOperations.jmoWorkCenterID

    Steve Anderson

  • Scott:

    Thank you for your reply.  What did I miss?

    Invalid column name 'jmoCompletedProductionHours'.

    Steve Anderson

  • Sorry, thought I changed it to jma.... (which I'll now do in my orig post as well).  But I'm assuming that such a column exists just because or your comments in the OP:

    SUM(jmaCompletedProductionHours) AS SUM_CompletedProductionHours

    Now that I think about it more, there's likely not such a column in that table.

    Keep in mind, we know NOTHING about your tables.  Especially as you didn't provide DDL for the tables, so I had to guess.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • >> I am trying to understand why I have to GROUP by all of the fields[sic: columns] in my SELECT Statement. Also, the query is returning the sum for all of the jmoJob_id fields [sic], when I want this to be summed by the jmaPart_id field [sic]. <<

    The ANSI/ISO standards use "field" to a part of the column that has some meaning in itself. For example, a DATE value has the fields {year, month, day}, which have some meaning but not a complete meaning in themselves. I think you meant column.

    Your use of camel case is also somewhat problematic, because so many ISO standards have different conventions about the meaning of upper and lowercase and identifiers. When I worked at AIRMICS, we also found that a person's eyes jump to an uppercase letter, so the text actually gets harder to read when you mix cases. This is why the body of your books, newspapers, etc. don't do that.

    Also, putting the table name as a prefix is a really bad idea. For example, that the job_id is a totally different attribute when it appears in job operations and in job assemblies; if this is the same data element, then it should have the same consistent name of your entire schema and actually for your entire enterprise. This convention is a leftover from the 1960s filesystems, when each file was completely separate from the others and not part of a schema

    It would also help if you would follow netiquette and post DDL for us. In over 30 years of writing SQL, I have very rarely had to use the SELECT DISTINCT because I usually have a key and the key guarantees me uniqueness. I also recommend not using an ORDER BY in a query. This turns the query into a cursor, and the display options for cursor should be done in the presentation layer. Never in the database layer.

    SELECT JO.job_id, JA.part_id, JO.workcenter_id, JO.process_id, JO.operation_quantity,

    SUM (JO.completed_production_hours) AS completed_production_hours_sum

    FROM JobOperations. AS JO,

    JobAssemblies AS JA

    WHERE JO.job_id = JA.job_id

    AND JO. job_id = '201739-01'

    GROUP BY JO.job_id, JA.part_id, JO.workcenter_id, JO.process_id, JO.operation_quantity;

    Now to answer your original question. Think about it, what single value would you use for, say, the workcenter_id when you have 100 of them?

    Please post DDL and follow ANSI/ISO standards when asking for help. 

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply