Does Log Data Belong in a Table or a File?

  • Comments posted to this topic are about the item Does Log Data Belong in a Table or a File?

  • Of course it isn't that simple!!! (But you knew that any way.)

    I worked on a system where they wanted to log "events" and get MI on performance. These were deemed separate requirements to logging for issue diagnosis. The two, diagnostic logs to file and "events" to table, differed because they were considered different classes of data. The different classes required different treatment as their requirements for backups, retention, analysis of, etc. demanded it.

    Does all log data belong in a file? Possibly.

    Does all log data only belong in a file? Probably not.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • The phrase I would use would be "choose files over database tables". That isn't slamming the door on using tables, its just saying that in most cases would be the preferred choice.

    Indexes for common use cases? That's the thing, the common use cases are un-indexable unless you count full text searches. There's a reason why people use Elastic Search, Logstash and Kibana!

    The main issue for me is that a huge percentage of what is in a log is noise. We incur IO cost to store, back up, ensure availability for logs which are bulky but with a low percentage of relevant data.

    Security. Putting stuff in log files that has security implications doesn't feel right.

    Logs tend to be semi-structured. Bunging the message in a VARCHAR(MAX) is as much as most people will do.

    If you need to track for security purposes don't call it logging, call it security tracking and design for that use case.

    I know of one project where, due to security requirements, 50gb per day of logs were captured. As long as they were captured security were happy. The fact that security didn't have the tools to query the logs didn't seem to worry them. For all they know rhe logs could be full of "Security guys are the Window of No" repeated over and over again!

    I think we have to ask ourselves what value does an RDBMS add to a particular use case? If we have finite resources would we squander those resources on this use case? Where is the ROI?

    Do I want to sweat blood keeping a system up and running because a logging system is hogging resources such as SAN space?

  • Attitudes like "X" should always be "Y" are silly and counterproductive, usually wrong, and anyone spouting them should be sent to go stand in the corner for a while. Logs should go wherever is appropriate for the data they capture. I use both tables and files for capturing such info, as I deem fit for the case at hand. Especially for later analysis, tables are much more useable. If someone is worried about dragging down their database with log data, you can always create a completely separate database strictly for logging, even on a separate machine if performance is that much of an issue.

    As with almost everything in SQL Server, the answer is 'it depends'.

  • Here's a third option to go with files and RDBMS's...

    I've successfully used NoSQL systems, like MongoDB and CouchBase, to store logs in the past. These are quick to write to and can store the log in a semi-structured format, i.e. as a piece of JSON. It's then possible to write some code which can query over the logs and try to pick up anything which looks like a problem.

    Kind of like a machine learning for logs!

    I've used files and SQL Server to store logs in the past. I would never advocate one approach is better than the other - as other posters have said, it depends - but I've found the NoSQL approach a good halfway house between the two.

  • pdanes (9/6/2016)


    As with almost everything in SQL Server, the answer is 'it depends'.

    ...and with everything else too!!!

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • If you write to files and don't need to research, that's great. But if you do need to research, that's well not so great.

    If you write to tables and don't need to research, no harm done. So what if there are millions of records. Create a job to purge based on size or dates. If you do need to search you have the means to do it.

    Seems to me that putting the data into files is the smarter thing to do unless doing so is a noticeable drag on the system.

  • If we're talking about error logs, then one major issue with containing the log records within the database is that, if the database is down, then you don't have access to critical information that could diagnose what went wrong. Another issue that your persistance of event logging would be functionally dependent on the transaction log, and the process of restoring the database may even result in the loss of log events that occurred in the moments loading up to the database crash.

    Perhaps one solution would be to log events to file, and then have a scheduled job that truncates and re-loads a table with the most recent 24 hours of log records. I'm assuming that what you're dong is trapping remarkable events like application errors or failed ETL jobs in which case you would have no more than a few hundred log records inserted over a 24 hour period, so a truncate/reload should be very quick. If you're inserting thousands of events per day, mundane stuff like "user 6721 clicked on button 282" then maybe reconsider how useful it is in the first place.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • There are some things that I can do with tables that I can't easily do with files, and if I want to do any of those things with some logged data it should probably be logged in tables. So unless there is some good reason not to use tables for logging I will use tables rather than files - this is just common sense for the cases where I know I wan't to do some of those things. It also pretects me from dificulties when I discover in future that I want to do something with the log data that I hadn't envisaged doing when I made the table/file choice. When there is a good reason not to use tables, I will use files.

    Oddly enough I believe the same rule works for all data, not just for logged data, and covers more than just files and tables - depending on the set of applications there are arguments in favor of logic databases, document databases, inverted log databases, navigational (hierarchic and network) databases, and maybe even object databases, as well as arguments in favour of files and relational databases (and probably other things too - I don't know what new data storage and handling methods will emerge in the next decade, and there may well be some existing methods that I don't know about).

    Tom

  • pdanes (9/6/2016)


    Attitudes like "X" should always be "Y" are silly and counterproductive, usually wrong, and anyone spouting them should be sent ...

    Not sure where they should be sent, but definitely agree that they should be sent somewhere for some period of time. People spouting this stuff are dangerous to the organization.

    For guidance as to where one should put them, I would first consider that just about any logging library out there will easily send them to many different places should be a clue. There are just too many things that people might do with log events. If you just want them "in case" then I would certainly say send them to a file. Then again, I have worked where we started sending them to a syslog server which could then be independently monitored. I have even seen an example of log events causing help desk tickets to be opened (and then if an appropriate log event occurred close the same).

    The whole point being that deciding in a vacuum shows a combination of lack of imagination, and immaturity. And thus why I seriously believe such behavior is dangerous to an organization.

  • There are all kinds of logs. For the ones you have to keep for auditing, certainly use a file. For the ones where the output is really data, it should be a table. Here are two real-world examples:

    I have two critically important log tables that we use for troubleshooting and monitoring.

    One log is the output of sp_whoisactive. It runs every 15 minutes; if I find slow processes or somebody comes telling me they had problems, I can search that time frame and see if there were big reports or blocking issues.

    The second is a record of every failure from an Oracle to SQL replication (about 400 tables involved). This is customized so we can identify the individual record; an email goes out every day if there have been records added. We can take this to the business to fix bad data, or discover an issue within our table structures.

  • From the Article:


    As we talked about options for accomplishing this, another member of the team remarked that all log data should go in files, period.

    Heh... I'm busy. Real busy. When someone makes remarks like that (really suppressing the urge to add the adjectives of stupid, ignorant, inexperienced, and astigmatic to the word "remarks"), I just smile ear to ear and push the platter over to them with a very cheerful "No problem... it's all yours"... unless the logging is critical-to-be-safe-and-always-available, at which time I'll try to convince them otherwise with or without pork chops depending on the problem itself. If I DO disagree and I'm overridden, then I'll stick it out in the meeting to find out what they're going to do and then go back to my desk and write something to catch them (as in "protect" them and, more importantly, the data) if their method fails because they don't actually matter to me... keeping the company I work for out of trouble by safeguarding and making the right and correct data available is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You Sir, have said exactly what I would have said. There should never be a "one fits all" attitude; look at the situation and do what is best

  • I do agree with you Andy, that log data shouldn't all be stored in a file. I like your analysis as to when it would be more appropriate to store what, where. Where I used to work we rarely stored log data. I now think that was a mistake. Where I work now, much to their credit, they store lots of log data. Although, it depends upon the situation and whose involved.

    At the end of the day, as the saying goes, it depends upon which storage technology one uses for storing log data.

    Rod

  • Hmm.... I tend to find files more appropriate for streaming data that might not necessarily break down into clean transactions or might not relate directly back to records in the database. For example streaming output from a batch process/service or record job start/end times, that's information that doesn't necessary relate directly back to any information in the database and could very likely require some customer structure to store. It's also information I'd typically expect to just open a file and scroll to the bottom. Where as something like record level audits would make more sense to store in the database.

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply