String_Agg and Group By Error

  • Hello All,

    I am in need of your assistance.

    I am trying to use the string_agg function with a group by and keep on getting the error below.

    Column 'Integration_02_Clinical.Schedule.Source System' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

    Could someone provide guidance on how to overcome this error?

    Below is the query I am trying to use. Everything works great until I add in the string_agg and group by.

    Basically I want to take the [MRN] column since it contains duplicate MRN's of the same patient and combine all the data into one flow. I am looking at scheduling rates by various topics for this project.

    select

    string_agg(c.[MRN],',') as [MRN]

    ,c.[Source System]

    ,c.[Source System]

    ,c.[Date of Appointment]

    ,c.[AM or PM Appointment]

    ,c.[Scheduled Appointment Type Name]

    ,c.[Scheduled Location]

    ,c.[Scheduled Location Number]

    ,c.[Scheduled Location Description]

    ,c.[Appointment Status]

    ,c.[Resource Description]

    ,p.[City]

    ,p.[County]

    ,p.[Gender Name]

    ,p.[Ethnicity]

    ,p.[Race]

    ,left(p.[Zip Code],5) as [Zip Code]

    ,datediff(YY, p.[Date of Birth],getdate()) as [Age]

    from [Integration_02_Clinical].[Schedule] as c

    INNER JOIN [Integration_02_Population].[Person] as p

    on c.[Person ID] = p.[Person ID]

    where c.[Date of Appointment] between '2021-07-01 00:00:00.0000000' and '2021-07-31 00:00:00.0000000'

    and c.[Appointment Status] IN ('No Show', 'Checked Out')

    and c.[Scheduled Location Number] = 786928

    group by c.[MRN]

    ORDER BY c.[Date of Appointment] DESC

    all the help is greatly appreciated.

    Thank you!

    -Cody

  • Any column in the SELECT result list that is NOT used in an aggregate (or windowing function) MUST be included in your GROUP BY UNLESS no aggregate operations are taking place.

    In your query above, you are grouping by c.MRN but no other columns, yet including multiple columns in the SELECT. SQL has no way to know how to group the remaining columns in the above with respect to the aggregate, so it tells you so in the error.

    Your options to correct it are:

    1 - add all of the non-aggregate columns into the GROUP BY <-- note this MAY give inaccurate results

    2 - use a windowing function on the aggregate calculation

    3 - use a CTE or JOIN with a nested select to provide a reduced data set that contains the aggregate

    Personally, I think option 2 is going to be the easiest, but do validate the data as you may get unexpected results if your windowing function isn't accurate.  Option 1 is most likely to get rid of the error while giving you odd results.

    Basically for option 2 (assuming your group by is accurate) you would change the string_agg function to:

    string_agg(c.[MRN],',') OVER (PARTITION BY c.[MRN]) as [MRN]

    Now, I am not sure that the query is going to work the way you expect, so if you could provide some sample data, some DDL, and some expected output, someone on the forum can probably help more than my above advice.  I am fairly confident my above advice will fix the error but may not give you the expected output.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply