Production DB Inserts with No Blocking

  • Hello,

    We have a 24x7 Production database running on SQL Server. And every alternate day there are customer tokens that need to be added to few of the Production tables. The token size ranges from 500 thousand to 1 million. Since the database is 24x7 the token upload introduces some amount of blocking for the users, this is because the customers are always redeeming.

    I have been asked to redesign the process to have absolutely zero blocking. No blocking acceptable while the token uploads are happening. One method of doing this is maintain a copy of the production tables and add the tokens to the production copy. Once the upload is complete, rename the production table as old table and rename the copy tables as production.

    The last step of the process is to keep the newly renamed Production tables uptodate with token redeemption status. This is done by comparing the old and the new table and setting token redeemption flag and the Tokenused datetime in the newly renamed tables.

    The only problem with this approach is that it involves manual efforts, which makes it not an ideal solution.

    Has anyone dealt with this kind of situation? Is there a way to avoid blocking completely

    Any ideas, thoughts will be greatly appreciated.

    Thanks in advance.

    Amol

    Amol Naik

  • Blocking is required for databases to function so no, there is no way to avoid it completely. 🙂 Your solution is definitely a possibility and probably pretty sound.

    The other option is to get your blocking minimized which is critical to any production database, not only in it occurring but when it does for the duration to be extremely short. So, some tuning on your inserts could be in order and then keeping your batch sizes really small could allow you to operate normally while still doing your inserts. The only problem here could be if you only had a brief time to get all that data in and bulk insert was the only method to accomplish that in the time provided. If that is the case then your solution could work.

    Might also consider rolling partition using partitioned tables. Being that this is a daily load, that could be high maintenance.

    A couple of thoughts anyway.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • Your proposed solution will have some "blocking" of sorts. First, the update at the end of the token status will still have to take some locks during the update. More importantly, the rename of the table will only be possible during a period in which nobody is querying. So, you will have to have the rename happen during some split second when nobody is running a query against the table.

    That being said, it can be made to work, but it is a lot of effort to avoid the locking mechanisms in the SQL database.

    You could use snapshot isolation. This effectively makes any query always go against the version of all records in the database at the beginning of the query. Because of this, it is free to query the database at it's latest committed point and completely ignores locked records and pages. It will add overhead to your system because it will have to maintain the record versions, but SQL is pretty good at this. It is pretty easy to implement and test, so it is a better direction to try first.

    As a general rule for dealing with blocking, keep your updates and inserts and fast as possible. If you are having a problem because you are inserting a million rows in a single transaction, see if the transaction can be broken up. It may take longer overall to do a bunch of inserts, but you may be able to split your insert process into small pieces that cause very short blocks that users do not really notice.

  • Amol.Naik (1/13/2009)


    The token size ranges from 500 thousand to 1 million.

    Are you saying the you have 500,000 to 1,000,000 tokens to insert or that you have several tokens that are each 500K to 1M characters long?

    (either way, I think that Matt's Snapshot Isolation idea is probably the way to go).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Thanks for the replies. I am already the INSERTs in a batch of 5000 rows using SSIS data flow task. Should i reduce the batch size further?

    As Mike pointed out let me also do some research on the Snapshot Isolation.

    By referring to 500K to 1M i am talking about the number of tokens in a given file. The tokens are however 12 characters long.

    Regards,

    Amol

    Amol Naik

  • I wouldn't necessarily reduce the batch size, but reducing the commit size may be helpful.

    It is the length of the transaction that becomes a problem. If all of your inserts manage to have transactions that are less than a second, waiting processes will never have to wait long.

  • I agree with Michael about renaming tables. It can get very tricky unless you are sure that nobody is running any queries against them while renaming. There is also the overhead of maintaining extra set of tables.

    One thing i would like to add is to watch out for log file growth.

    "Keep Trying"

  • Just to take a slightly different approach, what is the clustered index on the table where the inserts are occurring? With a bit of tuning, probably including the CreateDt column as a part of the clustered index, you should be able to distribute the inserts such that they don't interfere, as much, with the other sessions on the instance.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for your replies guys.

    Mike,

    Are you referring to the commit size in the OLE DB Destination component within a Data Flow? How big or small should be this size if the batch size is say 5,000. Right now i have kept the commit size also as 5,000. How does this work?

    Grant,

    The Primary Key clustered index is infact defined on the TokenNo VARCHAR(25). Tokens used to be 25-digit codes earlier, but now they are 12 digit. These are guaranteed to be unique and are provided to us by a vendor in a text file.

    Thanks,

    Amol

    Amol Naik

  • Amol.Naik (1/14/2009)


    Grant,

    The Primary Key clustered index is infact defined on the TokenNo VARCHAR(25). Tokens used to be 25-digit codes earlier, but now they are 12 digit. These are guaranteed to be unique and are provided to us by a vendor in a text file.

    Thanks,

    Amol

    The default behavior, and frequently the best behavior, is to have the primary key clustered. But it doesn't have to be. You could create a clustered index that distributes the data better so that you focus the inserts into a particular set of pages. It will cut down on the contention.

    It's just an option, not necessarily THE solution.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • As Grant stated far more eloquently than I can, having the clustered index there may be causing you to split pages as the token is probably not a sequential id. A query you can use to see how effective that index is and how many splits are associated with it as follows;

    declare @dbid int

    declare @objectid int

    select @dbid = db_id()

    /*This is to look at the utilization of indexes on a specific table.

    Comment following statement and the statement in the where clause if you

    want to look at the whole database.

    */

    select @objectid = object_id('YourTable')

    selectobjectname=object_name(s.object_id)

    , indexname=i.name

    , i.index_id

    , reads=range_scan_count + singleton_lookup_count

    , 'leaf_writes'=leaf_insert_count+leaf_update_count+ leaf_delete_count

    , 'leaf_page_splits' = leaf_allocation_count

    , 'nonleaf_writes'=nonleaf_insert_count + nonleaf_update_count + nonleaf_delete_count

    , 'nonleaf_page_splits' = nonleaf_allocation_count

    fromsys.dm_db_index_operational_stats (@dbid,NULL,NULL,NULL) s,

    sys.indexes i

    whereobjectproperty(s.object_id,'IsUserTable') = 1

    and i.object_id = s.object_id

    and i.index_id = s.index_id

    and s.object_id = @objectid

    order by

    reads desc

    , leaf_page_splits desc

    , leaf_writes

    , nonleaf_writes

    Run this in the database in question and change "YourTable" to your table name. The query originally came from http://blogs.msdn.com/sqlcat/archive/2006/02/13/531339.aspx

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply