Does Log Data Belong in a Table or a File?

  • Multithreaded implementations of web services with each concurrent call executed on separate threads. Logged to a flat file? Just saying.


    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • After years of corrupted flat files with highly transactional video game logs, I stick to tables if I can. But then again, it really just depends. Most of the data connectors I wrote (i.e.: API's), write to file for their logs first and SQL Server absorbs those log files as part of the ETL process (i.e.: did the data connectors download data?)

  • Of course, "it depends" in the the answer to the question posed by the article. However, when the decision is made to store logs in flat files, I know in advance that I'm going to be getting a request to bring it into the database for analysis. It isn't a matter of if it'll happen, but a matter of when it'll happen.

    They usually think ahead and datestamp their text entries, so my only wish is usually that the keeper of the logs can provide a differential - only what's new since the last time they gave me their logs.

  • I tend to store things in tables simply because of reporting and retention requirements. If a server gets tight on space it would be no surprise that someone would delete log files to free it up. In the virtual world that isn't a regular thing these days since we can just throw more space at the issue if need be, but that isn't always the right thing to do. SAN space isn't limitless and it certainly isn't free. I have one process where we are required to retain five years of log records. I could create a file system process to keep all that in order and compress older files, but if someone wants to see something from two years ago I'd have to go through some work to dig it out. And of course, they want it in Excel. Being in a table and having an SSRS front end with parameters allows the requester to get things on their own.

    Using tables allows me to have a single Agent job that maintains table retention requirements. If I have 20 processes that log things on different servers, I don't want to have to potentially touch any of them. I also don't want to have to create processes that pull text files from multiple servers and load them in to a single location because I know people will want reports, not text files. A table solves that. Our ESXi clusters have 30 day retention policies, but I pull everything in to a log table and keep it so we can know everything that has ever happened to an object, not just 30 days.

    I really can't think of a reason I would want to log things to a file unless it was a requirement.


  • It's now 2021, and one option we have now is to log errors to something like a csv file, and then mount it as an external table using PolyBase, so you get the best of both worlds: high performance and usability.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Logging process failure notifications to a Slack channel is good too, because there an existing app for receiving alerts and it's easily accessible to users across the organization.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • As Eric Russell pointed out back in 2016, I've used a hybrid approach before where the data is written to a file for performance and in case the server crashed, then periodically loaded into a table for more thorough analysis.

  • Cloud vendors have their own logging solutions and means to query those logs, even if that is providing a transparent means of querying using one of their data technologies.

    One thing I have learned is that data partitioning is important if you want to avoid astronomic bills for what tends to be scanning operations.

    • Google BigQuery charges you for the data you scan and everything is a scan.
    • AWS Athena also scans.

    As ever there is a cost in providing a facility.  Work out what you actually need rather than provide a whole bunch of stuff you think might be useful.

    If a genuine need arises then the cost to satisfy that need may be acceptable with what you have already.  When the value of the need is high that is the time to see if development costs for a more optimised solution will actually represent a saving.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply