Simple Python question...

Question

Post reply

Simple Python question...

pietlinden

SSC Guru

Points: 63632

September 29, 2020 at 5:29 am

#3791507

I have a table in SQL Server like this:

CREATE TABLE #SymptomData (
    PatientID INT NOT NULL,
    Symptom VARCHAR(20) NOT NULL,
    ConsentDate DATE NOT NULL,
    Severity TINYINT NOT NULL );
GO

INSERT INTO #SymptomData (PatientID, Symptom, Severity, ConsentDate) 
VALUES (1001, 'Diabetes', 3, '1-jan-2020'),(1001, 'Diabetes', 4, '15-jan-2020'),(1002,'Coronary Artery Disease',2,'01-Jan-2019'),(1002,'Coronary Artery Disease',3,'01-Jan-2020');

I'm trying to do return the count per (Symptom, Severity) in Python, and it's giving me fits. In SQL it's stupid simple

SELECT 
     PatientID,
     Symptom, 
     MAX(Severity)
FROM #Data
GROUP BY 
     PatientID,
     Symptom

Getting the population size is trivial in both... just do a

COUNT(DISTINCT(PatientID)) in T-SQL or

in T-SQL or

population_size = df.INDIVIDUAL_ID.nunique()

but how do I do get both the count and the percent of total in the same query in Python? I can get one or the other, but not both together.

# create a datafram with just IndividualID, LongName, delivery level
df2 = df.loc[:, ["INDIVIDUAL_ID", "LONG_NAME", "DELIVERY_LEVEL", "CONSENT_DATE"]]

# rename the columns (PatientID, Symptom, Grade)
df2.columns = [
  'PatientID',
  'Symptom',
  'Grade',
  'ConsentDate'
]
# get the max grade for each combination of (PatientID and symptom)
df3 = df2.groupby(['PatientID','Symptom'], as_index=False).agg({'Grade':max})

# use the above query and do a count, and divide by the population_size
df3.groupby(by=['Symptom','Grade']).agg("count")/population_size

# up to here works... but how do I combine the MaxGrade and the Count?

In T-SQL it's stupid easy... just wrap the inner query in () and query it in the outer query. But how do I do that in Python?

SELECT @PopulationSize = COUNT(DISTINCT(PatientID)) FROM ToxicityData;

SELECT 
        mgp.Symptom
       ,mgp.WorstGrade
       ,Frequency = COUNT(*)
       ,Relative_Frequency = COUNT(*)/@PopulationSize
FROM
      (SELECT PatientID, Symptom, WorstGrade = MAX(Severity)
       FROM ToxicityData
       GROUP BY PatientID, Symptom) mgp
GROUP BY mgp.Symptom, mgp.WorstGrade;

Sorry for the daft question... just for the life of me can't figure out how to return a Frequency and Relative Frequency (divided by Population Size) in the same data frame.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Site Owners SSC Guru Points: 80411 More actions · Answer 1

Thanks for posting your issue and hopefully someone will answer soon.

This is an automated bump to increase visibility of your question.