Query Help to Match Multiple Results and Average

  • I work for a school district and need some help with a new requirement we were just given for scoring some student scores.

    Everything will currently be in one table to keep track of students test scores for various things. This table will contain students information and a student will be shown more than once in the table. The Student ID is what we will key off of to find the multiple instances of the student. The table contains the following columns: studentName, StudentId, teacherName, focus1, controllingIdeas1, reading1, development1, organization1, conventions1, and contentUnderstanding1. All of the columns with a 1 at the end will be numeric values with possible decimal values.

    What we need to be able to do is some how perform a search for these multiple entries of each student and when found, average the 2 scores for each 7 test categories. The result needs to be a single line for each student that gives the student name, student id, and the 7 test category averages exported to an csv file.

    Any help of where to start and how to accomplish this task is greatly appreciated.

  • Could you please post DDL for table, some sample set and the expected O/P for the same ? This will help us understand and to work on the problem better.

  • I have attached the DDL for the database.

    I do not have an example of the output data, but I'll see if I can explain it a little easier now that the DDL has been created.

    We need to do a query for all duplicate studentID values. Once found, we need to then average all focus1, controlling1, reading1, etc. values for those duplicate studentIDs. Once the averages are done we then need to spit out the studentName, StudentID, focus1Average, controlling1Average, reading1Average, etc. to a csv file. I believe the export to a csv is easy after the rest of the query is done since I can just right click and save as a csv. It's the query that I am struggling with.

  • Can you post some dummy data so we can see what your data looks like? Are you importing this from somewhere or entering it manually? The reason I ask is that the structure is somewhat puzzling. Why are all the columns defined as floats? (Even a StudentID? Normally, that's an INT).

    Once you normalize, querying/summarizing this thing would be a walk in the park. Any reason you can't do that? What if you need to add a third set of values for each student? When I see structures like this, I worry.

    Do you have control over the structure of the database? I get the feeling it could use some normalization. Then querying it for answers should get a lot easier.

  • Attached is some dummy data. We are importing the data using the SQL Data Import utility from excel spreadsheets. Normalizing is something we might do later down the road, but because of the way we are receiving the data and our time restraint to get the averages out, and not being SQL experts, we are not that concerned about it right now. The data types that are setup are what the import utility set them to, so to make things easier right now we decided to stick with them. There very well could be more than 2 results for the same student. We do have control over the database as we are hosting it locally.

  • In SSMS, go to Help... View Help.

    This will pull up Books On Line.

    Typing Average in the upper right search box will bring up relevant topics.

    If time is of one of the main factors, a pivot table directly in Excel can be added in less than a minute.

    Note in your sample data, 2 of the Students have no ID - which is the key to the Student.

    Although I am glad you are trying to learn, something just seems odd.

    Something as important as grades, and this is such a basic query question.

    And a school district managing some of it in Excel?

    Do they have any openings? 😀

  • IT wasn't notified of the project until last minute. An excel spreadsheet was already distributed out to teachers to submit their scores with. It's not the ideal situation, but it's what we have and are trying to work with it. Next year we hope to have a better solution in place. The scores are not for normal grades. I go into the long history of why we are at this point, but that doesn't help solve our issue at hand. I am aware that the data is not clean yet. We will be doing that once we have all of the data imported into SQL.

    I understand how to use the AVG() function in SQL. The part that I am having problems with is finding all duplicate studentID's and then after averaging the results, spitting it out so it only lists the student information once with the averages.

  • bsmith 63193 (3/10/2014)


    IT wasn't notified of the project until last minute. An excel spreadsheet was already distributed out to teachers to submit their scores with. It's not the ideal situation, but it's what we have and are trying to work with it. Next year we hope to have a better solution in place. The scores are not for normal grades. I go into the long history of why we are at this point, but that doesn't help solve our issue at hand. I am aware that the data is not clean yet. We will be doing that once we have all of the data imported into SQL.

    I understand how to use the AVG() function in SQL. The part that I am having problems with is finding all duplicate studentID's and then after averaging the results, spitting it out so it only lists the student information once with the averages.

    Using Group By student ID and name, avg() on the other columns, duplicates are averaged.

    If only 1 record, score/count of 1 is the score for that column.

    Add Count(*) as a column and validate some of the data.

    Our you can do a Count(*) on student ID and name, and use a having statement > 1 to see all the dupes.

    Maybe I'm missing something, but you seem to be overthinking some of this.

  • Attached is a spreadsheet that depicts what we have in SQL and what the output needs to be based on that information. It is a very small sample, but you can see that there are several students with multiple entries and some students that have a single entry. I hope this helps explain what we are trying to accomplish. I know I can do this in Excel as I have shown, but we would prefer to be able to accomplish this in SQL if possible.

  • maybe of some use

    with cte_names as

    (

    SELECT StudentID, MAX(StudentName) AS StudentName

    FROM lcdscores

    GROUP BY StudentID

    )

    SELECT lcdscores.StudentID

    , cte_names.StudentName

    , CAST ( AVG ( lcdscores.focus1 ) AS decimal ( 9 , 2 )) AS focus1

    , CAST ( AVG ( lcdscores.cont1 ) AS decimal ( 9 , 2 )) AS controlling1

    FROM

    lcdscores INNER JOIN cte_names ON lcdscores.StudentID = cte_names.StudentID

    GROUP BY lcdscores.StudentID

    , cte_names.StudentName;

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (3/10/2014)


    maybe of some use

    with cte_names as

    (

    SELECT StudentID, MAX(StudentName) AS StudentName

    FROM lcdscores

    GROUP BY StudentID

    )

    SELECT lcdscores.StudentID

    , cte_names.StudentName

    , CAST ( AVG ( lcdscores.focus1 ) AS decimal ( 9 , 2 )) AS focus1

    , CAST ( AVG ( lcdscores.cont1 ) AS decimal ( 9 , 2 )) AS controlling1

    FROM

    lcdscores INNER JOIN cte_names ON lcdscores.StudentID = cte_names.StudentID

    GROUP BY lcdscores.StudentID

    , cte_names.StudentName;

    In the first sample data, there are a couple of NULL StudentID records. 😀

    And the Excel Example is yet another set of data.

    I refrained from writing a select statement, thinking that the poster would create one with AVG on a column.

    And understand that the Group By creates the distinct records being asked for.

    Hopefully they will understand a CTE.

  • J Livingston SQL (3/10/2014)


    maybe of some use

    with cte_names as

    (

    SELECT StudentID, MAX(StudentName) AS StudentName

    FROM lcdscores

    GROUP BY StudentID

    )

    SELECT lcdscores.StudentID

    , cte_names.StudentName

    , CAST ( AVG ( lcdscores.focus1 ) AS decimal ( 9 , 2 )) AS focus1

    , CAST ( AVG ( lcdscores.cont1 ) AS decimal ( 9 , 2 )) AS controlling1

    FROM

    lcdscores INNER JOIN cte_names ON lcdscores.StudentID = cte_names.StudentID

    GROUP BY lcdscores.StudentID

    , cte_names.StudentName;

    Thank you! Works like I charm. Although I do not know what CTE's are I will study the query you provided to learn more about them.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply