Getting the top record when max date filter returns multiple rows

  • I have a query to import specific client data from an unnormalized set of tables brought over from mainframe data. What I want is to have only one record per client ChildID value and have developed the below SQL which does a pretty good job of filtering down several million records down to 49,329. However, there are only 48,902 unique ChildID records. The additional records are included because getting the max date field from the joined table, pulls duplicate records for the same date because the mainframe dates do not include time part. I need to know what I need to change in the following SQL to only get only the Top record from the CFTVCM9200 join table when the max LIVAR_DATE field has duplicate values?

    SELECTc.CASE_ID as ChildID,

    c.LAST_NAME as ChildLastName,

    c.FIRST_NAME as ChildFirstName,

    c.MI_NAME as MI,

    c.CASE_RSF as ChildRSF,

    cc.Sex_Code as Sex,

    cc.Brth_Date as BirthDate,

    cp.PRV_ID as ProviderCode,

    cp.Type_Serv_Code as TypeServiceCode

    INTOClaimsStatement.dbo.Clients2

    FROM CFTVCM9000 as c

    left join CFTVCM9200 as cp ON c.Case_ID = cp.Case_ID

    left join CFTVCR9000 as cc ON c.Case_ID = cc.cli_ID

    WHERE (c.CASE_STAT = 'O') AND (c.CASE_TYPE = 'C') AND cp.LIVAR_DATE IN (SELECT max(LIVAR_DATE) FROM CFTVCM9200 WHERE CASE_ID = c.CASE_ID)

    ORDER BY c.case_id, c.Last_Name, c.First_Name

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Without sample data it's difficult to say what the best approach is, but there are two general approaches: Row_Number() or CROSS/OUTER APPLY. If you want tested code, you really should supply sample data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • What would be better yet if you provided the DDL and sample Insert Statement.;-)

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • See attached .doc file with DDL of the three tables in query. The CFTVCM9200 table has a seperate LIVAR_TIME two char field to denote the hour which would be useful in limiting the returned record to the most recent Client record, however I was unsure how to do a MAX(...) for two seperate fields as it seems to only accept one field or ALL and cannot be repeated in the statement.

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Why not just do a GROUP BY on all of the columns?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I do not see how a GROUP BY will help as I have done that several times. What I need to do a MAX(LIVAR_DATE) with a MAX(LIVAR_TIME) in the same statement, but T-SQL does not allow that. Perhaps you could illustrate, starting with my query above, of how a GROUP BY will resolve my dilemma?

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Difficult to be sure without sample data.

    First off, I do not recommend repeated use of SELECT INTO - you've got the overhead of deleting and rebuilding the table every time. Let's assume that the table exists and you simply want to repopulate it every time the routine runs. Here is my first stab at how that would work (untested & no guarantees):

    use ClaimsStatement

    Go

    truncate table dbo.Clients2

    Go

    ;with clients as (

    SELECT c.CASE_ID as ChildID,

    c.LAST_NAME as ChildLastName,

    c.FIRST_NAME as ChildFirstName,

    c.MI_NAME as MI,

    c.CASE_RSF as ChildRSF,

    cc.Sex_Code as Sex,

    cc.Brth_Date as BirthDate,

    cp.PRV_ID as ProviderCode,

    cp.Type_Serv_Code as TypeServiceCode

    FROM CFTVCM9000 as c

    left join CFTVCM9200 as cp ON c.Case_ID = cp.Case_ID

    left join CFTVCR9000 as cc ON c.Case_ID = cc.cli_ID

    WHERE (c.CASE_STAT = 'O') AND (c.CASE_TYPE = 'C') AND cp.LIVAR_DATE IN (SELECT max(LIVAR_DATE) FROM CFTVCM9200 WHERE CASE_ID = c.CASE_ID)

    GROUP BY c.CASE_ID,

    c.LAST_NAME,

    c.FIRST_NAME,

    c.MI_NAME,

    c.CASE_RSF,

    cc.Sex_Code,

    cc.Brth_Date,

    cp.PRV_ID,

    cp.Type_Serv_Code

    )

    INSERT INTO ClaimsStatement.dbo.Clients2(ChildID, ChildLastName, ChildFirstName, MI, ChildRSF, Sex, BirthDate, ProviderCode, TypeServiceCode)

    SELECT ChildID, ChildLastName, ChildFirstName, MI, ChildRSF, Sex, BirthDate, ProviderCode, TypeServiceCode

    FROM clients

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I still think that CROSS APPLY is going to be your better solution, but, again, without sample data there is no way to test the code. Based on Phil's code, I've made modifications to use CROSS APPLY.

    SELECT c.CASE_ID as ChildID,

    c.LAST_NAME as ChildLastName,

    c.FIRST_NAME as ChildFirstName,

    c.MI_NAME as MI,

    c.CASE_RSF as ChildRSF,

    cc.Sex_Code as Sex,

    cc.Brth_Date as BirthDate,

    cp.PRV_ID as ProviderCode,

    cp.Type_Serv_Code as TypeServiceCode

    FROM CFTVCM9000 as c

    CROSS APPLY (

    SELECT TOP 1 PRV_ID, Type_Serv_Code

    FROM CFTVCM9200 as cp

    WHERE c.Case_ID = cp.Case_ID

    ORDER BY LIVAR_DATE DESC, LIVAR_TIME DESC

    ) AS cp

    left join CFTVCR9000 as cc ON c.Case_ID = cc.cli_ID

    WHERE (c.CASE_STAT = 'O') AND (c.CASE_TYPE = 'C')

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • THANK YOU DREW! Your Cross Apply example was exactly what I needed. I tried to research the Cross Apply when you first suggested it, but the complexity of the article I found did little in explaining its usage in a way I could understand. Your example helped immensly in understanding its usage as well as solving my current dilemma. It will definitely be going into my tool box.

    I don't think Phil understood the original question as his solution was identical to the one I had already achived with my original query, only replacing the SELECT INTO and adding a GROUP BY but still returning the same duplicate ChildID records when the same date was used more than once.

    I was just about to try to simplfy the request by including only relevant DDL and providing sample dummy data as I could not just copy and include proprietary and confidential State information; when I saw your response. Thanks again!

    Ron K.

    "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler

  • Maybe the DISTINCT clause could be useful in this case.

    SELECT DISTINCT c.CASE_ID as ChildID,

    c.LAST_NAME as ChildLastName,

    c.FIRST_NAME as ChildFirstName,

    c.MI_NAME as MI,

    c.CASE_RSF as ChildRSF,

    cc.Sex_Code as Sex,

    cc.Brth_Date as BirthDate,

    cp.PRV_ID as ProviderCode,

    cp.Type_Serv_Code as TypeServiceCode

    INTO ClaimsStatement.dbo.Clients2

    FROM CFTVCM9000 as c

    left join CFTVCM9200 as cp ON c.Case_ID = cp.Case_ID

    left join CFTVCR9000 as cc ON c.Case_ID = cc.cli_ID

    WHERE (c.CASE_STAT = 'O') AND (c.CASE_TYPE = 'C') AND cp.LIVAR_DATE IN (SELECT max(LIVAR_DATE) FROM CFTVCM9200 WHERE CASE_ID = c.CASE_ID)

    ORDER BY c.case_id, c.Last_Name, c.First_Name

  • Narud (9/14/2011)


    Maybe the DISTINCT clause could be useful in this case.

    This would require that all records with the same CaseID and Date have exactly the same ProviderCode (PRV_ID) and TypeServiceCode (Type_Serv_Code). Since the OP never provided sample data, we can't be sure of that. However, it seems likely that the reason that you even have multiple records on the same date is that you are recording different services requiring different TypeServiceCode, in which case DISTINCT would not give you the required results.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply