Stored Procedure error log integration with APM tools

  • Hi all,

    We typically consolidate our logging in an APM tool (eg: like Datadog).

    The one area we have a gap is error logging with stored procedures which may be invoked via applications or via jobs.

    We were thinking of writing these errors out to the SQL error log and then using our APM tool agent to tail/monitor the file.

    However, most of the articles I find about stored procedure error logging recommend using a log table.  Sadly, our APM tool doesn't have the ability to tail logs from a table.  It only supports log tables for metrics, which is different than logging errors themselves.

    I was wondering if there are solutions for this, given the benefit of consolidating monitoring and alerting into a single APM tool.

    Thanks

     

  • My opinion - I would do whatever works best for the tool you are using.  BUT I would also be careful about using a tool to monitor SQL error logs.  The reason being you don't want a tool locking the log while it reads it thus preventing or slowing down SQL from writing to the log.

    But if you are planning on writing the errors in a stored procedure to the log, that is not that difficult to do.  RAISEERROR command will populate the log, just be sure you are selecting the correct severity and such.

    Personally, I am not a big fan of "all in one" tools.  I often find that when a tool claims to be able to do everything OR when I start hacking things together to have "one tool to rule them all", I end up with a tool that does everything poorly.  What I mean is if I want to do database monitoring, I am going to be buying a database monitoring tool.  If I am doing filesystem monitoring, I will buy a filesystem monitoring tool.  OS?  OS monitoring tool.  There are benefits to consolidating monitoring and alerting into a single tool, but there are risks too.  Recently, I had a SQL Server disk go from 80% full to 96% full and our OS monitoring tool gave no alerts about it.  Thankfully our database monitoring tool could monitor disk as well and I got an alert.  Later it was discovered that the OS monitoring tool had crashed and corrupted its own database, so it needed to be restored from backup.

    What I am saying is that consilidation is nice as you have 1 location to view all the data, but redundancy is nice so when your main system fails, you can still get alerts that you care about.  BUT if the tool can't handle the recommended method for capturing data, I don't want to create a hack to make it work.  I would much rather have 2 tools that work well than 1 tool that I had to create some very specific steps to implement.

    Now, on top of the above, where I work, our stored procedures are called by in-house developed applications.  These applications are used for the error handling and logging.  Our stored procedures have sanity checks and data validation checks and in the event an error may occur somewhere even after the sanity checks, we have try-catch blocks.  On error, an integer error number is returned to the application along with an error string indicating what went wrong and that gets handled in the application side.  Invalid input, for example, doesn't need to be logged.  Failed insert, should be logged.  But we handle the logging on the application side.

    Now, for stored procedures called from jobs, if a job fails for any reason, it is investigated.  A SQL job failure results in an email going out to the DBA team and someone on that team will investigate it.  Once the problem is determined in the job, we work to correct the issue and to implement steps, processes, or changes to the stored procedure to prevent that error from re-occurring. We don't really have a need for a 3rd party tool to track those errors too.

    Now, your situation may be different and you may benefit from those logs.  If that is the case, I would still recommend following best practice and logging to a table.  Once logged to the table, you can set up a job that selects from that table on a daily, weekly, hourly, yearly, whatever frequency makes sense to you and fire off an email to whoever is responsible for that system.

     

    Now, if you REALLY want to use the AMP tool to review the failed stored procedures, I would follow best practice and dump the errors to a table.  Then I would use something like SSIS or xp_cmdshell or SSRS or powershell to pull the data from that table and dump it to a file for the AMP tool to grab.  You won't get real-time data that way, but you have the failure history in an easy to search format and you won't run into potential issues with log rotation causing things to be missed or lost by your AMP tool AND you don't run the risk of locking the SQL log file by accident.  And in the event your dump file for AMP is locked, your job that was doing the dump will log that it failed and you can just re-dump the data.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Thanks very much for your thoughtful reply.  The point about the APM agent locking up the SQL error logs was something we hadn't considered.  We have similar log files for our application logs, and the agent has never locked those up when tailing the files, but we definitely would not want to discover that the wrong way.  The APM tool already does a good job of helping us identify and alert on stored procedures that have errors when they're called from our application layer.  It's the jobs where we don't have a good solution yet.

    I am a big fan of MVPs - and to your point (and consensus within the team) is we'll end up writing the errors to a table as a first step, and avoid potential risks associated with using the SQL error log.  The log entries themselves won't be in our APM tool, but we can at least send metrics that errors are being generated and send those to our APM tool.  Then we can alert if we hit a threshold of errors, and see how well that solution works for us.  We're also working with our APM vendor to see if they can offer a better solution for stored procedure error logging, as this isn't really covered in their recipe book at this point (hence prompting my post)

    Thanks again

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply