Tuesday morning I discovered an overnight ETL process running on an Azure IaaS instance had aborted. Yesterday, I was told that Microsoft rebooted our Azure hosted servers in the process of applying some emergency patch. I'm guessing this fix was it. While this resulted in only a couple of minutes downtime for the server, we actually lost several hours of processing work downstream. This is why I believe that fewer maintenance windows (preferably scheduled in advance) of longer duration are better than more frequent random occurrences of short duration. However, this particular issue was probably a rare event. We also need to look into making our ETL process more robust, utilizing retry logic and the capability to restart from SSIS checkpoints.
"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."