I check almost everything I can think of every day.
we used to call it a daily health check.
Every day I add a new check to the job and export the list of failures to an internal website so that first line engineers can see these issues.
PLE, wait times, buffer cache hit ratio. etc etc, these are all as important as failed sql agent jobs
we have a lot of servers and more databases (I guess maybe 400 distinct databases) than I can count, so centralising the data is key.
Enabling others to say that a server might be out of our "tollerance zone" helps hugely.
Needless to say, we have not got it right yet
if i add a check every day, it makes my life easier when we add new servers or databases... currently i'm checking for objects that don't compile (no sense breaking builds in devops with code that doesn't work)