I have two datasets, where I would like to join and then group by a particular column
df1
version host date name purpose
pat a16 12/1/2019 pat hi
fam a16 12/1/2019 fam hi
emp a16 12/1/2019 emp hi
dan a16 12/1/2019 dan hi
df2
pc size free date
a16 5 1 12/1/2019
a40 4 3 12/1/2019
a41 3 1 12/1/2019
Desired
host date purpose pc size free
a16 12/1/2019 hi a16 5 1
Doing
SELECT
sum(size) AS size, sum(free) AS free,
df1.purpose, df1.date, df1.host
FROM df1
JOIN df2 ON
df1.host = df2.pc AND
df1.date = df2.date
GROUP BY df1.purpose, df1.date, df1.host
The problem is that this join keeps giving me this below with the sum of size being
size 20 and free 4, when in actuality, the sum of size should be 5 and free should be 1.
I am trying to reflect the above in a query. Any suggestion is appreciated.
host date purpose pc size free
a16 12/1/2019 hi a16 5 1
a16 12/1/2019 hi a16 5 1
a16 12/1/2019 hi a16 5 1
a16 12/1/2019 hi a16 5 1