Simple Python question...

  • pietlinden

    SSC Guru

    Points: 62902

    I have a table in SQL Server like this:

    CREATE TABLE #SymptomData (
        PatientID INT NOT NULL,
    Symptom VARCHAR(20) NOT NULL,
    ConsentDate DATE NOT NULL,
    Severity TINYINT NOT NULL );
    GO

    INSERT INTO #SymptomData (PatientID, Symptom, Severity, ConsentDate)
    VALUES (1001, 'Diabetes', 3, '1-jan-2020'),(1001, 'Diabetes', 4, '15-jan-2020'),(1002,'Coronary Artery Disease',2,'01-Jan-2019'),(1002,'Coronary Artery Disease',3,'01-Jan-2020');

    I'm trying to do return the count per (Symptom, Severity) in Python, and it's giving me fits. In SQL it's stupid simple

    SELECT 
         PatientID,
        Symptom,
    MAX(Severity)
    FROM #Data
    GROUP BY
    PatientID,
    Symptom

    Getting the population size is trivial in both... just do a

    COUNT(DISTINCT(PatientID)) in T-SQL or

    in T-SQL or

    population_size = df.INDIVIDUAL_ID.nunique()

    but how do I do get both the count and the percent of total in the same query in Python? I can get one or the other, but not both together.

    # create a datafram with just IndividualID, LongName, delivery level
    df2 = df.loc[:, ["INDIVIDUAL_ID", "LONG_NAME", "DELIVERY_LEVEL", "CONSENT_DATE"]]

    # rename the columns (PatientID, Symptom, Grade)
    df2.columns = [
    'PatientID',
    'Symptom',
    'Grade',
    'ConsentDate'
    ]
    # get the max grade for each combination of (PatientID and symptom)
    df3 = df2.groupby(['PatientID','Symptom'], as_index=False).agg({'Grade':max})

    # use the above query and do a count, and divide by the population_size
    df3.groupby(by=['Symptom','Grade']).agg("count")/population_size

    # up to here works... but how do I combine the MaxGrade and the Count?

    In T-SQL it's stupid easy... just wrap the inner query in () and query it in the outer query. But how do I do that in Python?

    SELECT @PopulationSize = COUNT(DISTINCT(PatientID)) FROM ToxicityData;

    SELECT
    mgp.Symptom
    ,mgp.WorstGrade
    ,Frequency = COUNT(*)
    ,Relative_Frequency = COUNT(*)/@PopulationSize
    FROM
    (SELECT PatientID, Symptom, WorstGrade = MAX(Severity)
    FROM ToxicityData
    GROUP BY PatientID, Symptom) mgp
    GROUP BY mgp.Symptom, mgp.WorstGrade;

    Sorry for the daft question... just for the life of me can't figure out how to return a Frequency and Relative Frequency (divided by Population Size) in the same data frame.

  • Site Owners

    SSC Guru

    Points: 80372

    Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply