Indexing and Querying a Frequently Updated Table

  • Apologies in advance if this is posted under the wrong heading - it's really a general performance tuning question. I have a table in an Azure SQL Database that is used to log messages from a variety of applications. It is written to frequently, let's say an average of 10 inserts per second. I need to query this table hourly in order to notify users if certain messages have been logged and then less frequently in order to perform cleanup operations such as deleting messages older than a certain date.

    So what's the best strategy for handling this? I don't want to slow writes down with a lot of indexing. But then I'm running select queries against a large and poorly indexed table. As a result, these select queries don't execute very quickly, so then I worry about locking. Because of this, whenever I run select queries I'm using the WITH (NOLOCK) hint - it's not the end of the world if I get dirty reads, but it seems like there has to be a better way.

    Should I leave the logging table optimized for writes and copy data out to another table that is well indexed for querying? If that is the solution, is there a best practice for implementing that copy operation so that it's quick and doesn't impact my writes?

  • YOu can think about data partitioning here. in your scenario it will help

    plus

    you can also take advantage of Sliding window data archival approach to remove the unwanted historical data.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • bcampbell-1136357 (10/15/2013)


    I'm running select queries against a large and poorly indexed table.

    And Why are you living with poor indexes ? that you need to work at first.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • bcampbell-1136357 (10/15/2013)


    Because of this, whenever I run select queries I'm using the WITH (NOLOCK) hint - it's not the end of the world if I get dirty reads, but it seems like there has to be a better way.

    The NOLOCK hint produces dirty reads but it potentially far worse than just a couple of transaction that haven't yet been committed. You can (and will) get missing and/or duplicate rows. Given the frequency this table is being written to this is likely to occur very frequently.

    http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx

    http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

    http://www.jasonstrate.com/2012/06/the-side-effect-of-nolock/[/url]

    Assuming you need to query this data frequently I would recommend copying the data you need to query to another table that is indexed. This process has the potential to be slow because you do have to run a query against the raw data. The advantage gained is you can then run multiple queries against the new table and it should be fast and will have no impact on your high frequency table.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks for the suggestions. That is direction I've been leaning. But that then leads to the question of what is the best method for copying out the data to another table without impacting the performance of the logging table? If I run a query hourly that copies over new data, something like...

    INSERT INTO dbo.NewTable (LogID, LogMessage, DateStamp)

    SELECT LogID, LogMessage, DateStamp

    FROM dbo.OldTable

    WHERE LogID > @MaxPreviouslyInsertedID

    ... without NOLOCK won't that query negatively impact the writes on the logging table?

  • I'm not sure what you mean by "a lot of indexing"... The number of indexes you described is moderate, at least from my perspective...

    Depending on the ratio of the messages relevant for sending the hourly email it might be worth trying a filtered index limited to the relevant messages and the columns required to send the email. To avoid frequent page splits you could change the fill factor for that index.

    Assuming the insert is performed using a stored procedure, another option would be a separate "notification table" holding all messages that require notification (again, depending on the ratio of relevant messages and the information that needs to get sent by email). After sending the email I'd probably delete the rows no longer needed to keep this table narrow.

    Regarding the delete issue:

    depending on the number of rows to deal with (in total and to be deleted) it might be worth to look into horizontal partitioning. You could then "just" switch the partition out of the original log table instead of performing the delete operation.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • bcampbell-1136357 (10/16/2013)


    then leads to the question of what is the best method for copying out the data to another table without impacting the performance of the logging table?

    Batch Approach http://www.sqlservercentral.com/articles/Top/63301/

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • I think Bhuvnesh hit the mark. Optimize the indexes on the table if they are poor.

    We have some logging tables and the development teams initial fear was to not create any indexes because it will impact performance! After actually testing, we found that a couple narrow indexes did not impact write performance more than read performance was impacted by not having them.

    Get Red Gate Data Generator, a dev box, and populate that table indexed vs not indexed. Confirm the fears before you run.

    Why is it that people who can't take advice always insist on giving it? - James Bond, Casino Royale

  • Lots of food for thought here, thanks. The table has existed for awhile, so I certainly have data to test with - it's got about 3 million rows and growing as we add more users to our application. I have tried carefully adding several indexes to the table to help with reads - but my main concern is making sure that the writes don't suffer. And there are occasional ad hoc queries that I have to run against the table to diagnose problems, which aren't always covered by an index. Anyway, I've got some options to try out. I do particularly like the idea of batching the cleanup operations, which should help with availability.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply