September 29, 2020 at 5:29 am
I have a table in SQL Server like this:
CREATE TABLE #SymptomData (
PatientID INT NOT NULL,
Symptom VARCHAR(20) NOT NULL,
ConsentDate DATE NOT NULL,
Severity TINYINT NOT NULL );
GO
INSERT INTO #SymptomData (PatientID, Symptom, Severity, ConsentDate)
VALUES (1001, 'Diabetes', 3, '1-jan-2020'),(1001, 'Diabetes', 4, '15-jan-2020'),(1002,'Coronary Artery Disease',2,'01-Jan-2019'),(1002,'Coronary Artery Disease',3,'01-Jan-2020');
I'm trying to do return the count per (Symptom, Severity) in Python, and it's giving me fits. In SQL it's stupid simple
SELECT
PatientID,
Symptom,
MAX(Severity)
FROM #Data
GROUP BY
PatientID,
Symptom
Getting the population size is trivial in both... just do a
COUNT(DISTINCT(PatientID)) in T-SQL or
in T-SQL or
population_size = df.INDIVIDUAL_ID.nunique()
but how do I do get both the count and the percent of total in the same query in Python? I can get one or the other, but not both together.
# create a datafram with just IndividualID, LongName, delivery level
df2 = df.loc[:, ["INDIVIDUAL_ID", "LONG_NAME", "DELIVERY_LEVEL", "CONSENT_DATE"]]
# rename the columns (PatientID, Symptom, Grade)
df2.columns = [
'PatientID',
'Symptom',
'Grade',
'ConsentDate'
]
# get the max grade for each combination of (PatientID and symptom)
df3 = df2.groupby(['PatientID','Symptom'], as_index=False).agg({'Grade':max})
# use the above query and do a count, and divide by the population_size
df3.groupby(by=['Symptom','Grade']).agg("count")/population_size
# up to here works... but how do I combine the MaxGrade and the Count?
In T-SQL it's stupid easy... just wrap the inner query in () and query it in the outer query. But how do I do that in Python?
SELECT @PopulationSize = COUNT(DISTINCT(PatientID)) FROM ToxicityData;
SELECT
mgp.Symptom
,mgp.WorstGrade
,Frequency = COUNT(*)
,Relative_Frequency = COUNT(*)/@PopulationSize
FROM
(SELECT PatientID, Symptom, WorstGrade = MAX(Severity)
FROM ToxicityData
GROUP BY PatientID, Symptom) mgp
GROUP BY mgp.Symptom, mgp.WorstGrade;
Sorry for the daft question... just for the life of me can't figure out how to return a Frequency and Relative Frequency (divided by Population Size) in the same data frame.
September 30, 2020 at 6:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy