• The emerging data science field is already making this a reality. There are very well-known statistical principles and methods that can be employed to analyze just about anything.

    These can be used to not only analyze the incoming data for errors, but also various facets of the ETL process itself to look for and alert the appropriate people to potential issues. Could be as simple as monitoring the mean processing time weighted by the amount of data processed (or other factors) to indicate possible performance issues. Time-series analysis could be used to detect and troubleshoot those troublesome "intermittent" ETL problems that just never seem to show up when someone from the IT department is looking. A time-series graph of the process could be compared against other data about the network, server, other infrastructure or even non-IT data for possible correlations to further investigate (which may even help in performing an RCA).

    Point being: We very often get into the "if it's not throwing an error, don't fix it" mode (which has a great deal of wisdom to it for certain) but there is also a place and time to proactively evaluate our systems and look at ways things cold improve.

    ____________
    Just my $0.02 from over here in the cheap seats of the peanut gallery - please adjust for inflation and/or your local currency.