Help with hitting a wall

  • I'm hoping someone can help me... We've built a SQL Server Database with a VB front end to scan a large archive of paper information.

    The database is living on a 1.6 TB RAID 5 array with the transaction logs on an 80GB RAID 1 drive. We have 5 book scanners attached to this which are scanning documents and storing the images as BLOBs in the SQL Server database. The server is a dual Xenon processor box with 4GB of RAM. Everything seems to run fine until the database file hits about 6.9 GB (the majority of which is the table containing the BLOB data - 6.2GB) then the system just starts to behave strangely.

    Write requests are refused, whereas read requests are processed, and any attempt to scan a new image into the system results in a total stopage and a Network Timeout error, though the machine that's stopped acan still read from the database quite happily.

    We restored a small database and began bulk testing again and everything runs fine... we're just waiting for the 6Gig wall.

    Has anyone out there had similar problems, ar have any suggestions as to a possible cause/cure.

    We're running SQL 2000 Standrard Edition with 5 concurrent users

    Thanks in advance

    Paul Anderson

  • is anything coming up in the error logs?

    is it possible to change the database to store hyperlinks to the file instead of using BLOBS?

    what recovery mode are you using? if it's full are you backing up the T-log (obvious question - sorry some of our customers get it wrong all the time)

    have you got any indexes on the table in question ??

    also have you tried splitting the database to use seperate file groups???

    MVDBA

  • To answer each in turn:

    Nothing untoward in the error logs... everything looks OK

    Unfortunately we need to stick with BLOBs, as they are later going to be mass OCR treated, then the resulting text docs stored as images and a full text search catalogue built against them

    Simple Recovery Mode - our backup strategy is Server 1 replicates the database to Server 2 (which has an identical RAID array) and this then uses 3rd party software to back the database up to DVD using differential backups. This was the strategy chosen by our client.

    One primary key based on the DOcument Identifier which is a GUID

    Not tried filegroups yet... That may be the next attempt..

    Thanks for your reply. Sorry for the delay in responding but I'm currently on-site in North Africa

    Paul Anderson

  • Forgot to mention...

    in about 10 days the main scanning project is due to begin, scanning a little over 2 million documents from the national archive, so this is something I need to crack soon...

  • hmm,

    what about having a word with the SAN engineer.

    are there any quotas on directory contents? or any restrictions outside of SQL?

    the other obvious thing to check is file size constraints in the database properties (i hope i'm not insulting you here but previously i've driven 100 miles out to site to correct this after an application vendor set a maximum size)

    i don't think seperate file groups will help unless you use partitioned tables (as far as i'm aware you can't split a tables data across 2 files) - although you can put your indexes on the other file group.

    might be worth downloading traceplus (win32) from http://www.snapfiles.com

    you can monitor the entire behaviour of your server and see what processes are occuring (in or out of SQL) when you get the problem.

    MVDBA

  • Don't worry about insulting me... I find that no matter how old she gets Granny still needs the occassional egg sucking lesson

    I've checked quotas and everything is OK. Left things running overnight with what few monitoring tools we've got (thanks for the tip on traceplus, I'm downloading it now), and of all things it looks like it may be a deadlocking problem!! Time to shoot the coder/designer.

    Will spend some time this morning nailing whether this is the problem, then se what I can do with indexes and isolation levels...

    Thanks for the help

    Paul Anderson

  • Some further info...

    After much investigation I'm now beginning to think it's nothing to do with the locking strategy.

    It seems to be after about 2 hours of having data pumped into the database it just stops accepting any further writes. You can read but not write, and there is not any option then except to stop and restart SQL. This buys you another 2 hours of data input, then you have to do the same again.

    Now I'm beginning to get confused

  • Not that I think its the cause of your problem, but I hope you haven't got a clustered index on that GUID. It could cause a large amount of data reorganisation every time a new record is added, causing temporary index page locking that may slow down other users adding records at the same time (especially if the database has to expand to do the reorganisation). Try taking the clustered and any other indexes off to see if it affects the 'wall'. If it does, use an identity field for your clustered index.

    Also check that your tempdb database is set to grow as needed.

    Just another idea to play with

    Peter

  •  What kind of SAN are you using? We had a problem with one of our DB servers, where it would intermittantly give up while reindexing some large tables, the same would work fine on other servers, connected to a different SAN. After months of different trials, including different drives, RAID combinations, we finally approached MS. They gave a hot fix Q838647, which  was for some SANs connected to Cisco switches. This fix by the way was not yet published on the MS site and you had to use password to access the fix.  This leads me to believe that there may be several other fixes out there, that MS may not publish, since they are very specific. BTW, we use IBM shark storage.

  • Nothing so fancy as a SAN I'm afraid.

    The hardware is a dual Xenon processor server with 2 2.4Ghz processors, 4GB of RAM, a pair of internal IDE 80GB disks configured as a RAID 1 mirror and a Transtec 5012 RAID 5 array of 12 160GB IDE drives connected through an Adaptec 29160 SCSI card.

    SQL is configured for the data to live on the Transtec RAID and the transaction logs on the RAID 1.

    We're on Win2K SP4 and SQL 2000 SP3a.

    The main server is running transactional replication to an identical server, which then uses Dantz Retrospec to burn DVD backups of the database from the second copy, but neither replication nor the backup are running at the moment as we're still soak testing the installation, and that is when we've hit this problem.

    I'm pretty confident that there's no deadlocking going on now, and that the database is tuned as well as it can be, but we still seem to just stop being able to write at intervals of between 20 minutes and 2 hours.

    It seems to depend on the volume of data being passed into the database. The more data (from multiple sources) the quicker it stops accepting writes.

  • Peter,

    Thanks for those suggestions. There isn't a clustered index on the GUID, though there is a secondary index on the table that is clustered. The Wall seems to have been a red herring as we're now up to 10GB and rising.

    Tempdb is set for unrestricted growth in 10% increments

    Cheers,

    Paul

  • I am beginning to think the problem may be tied to the database growth increments though. I'm going to experiment with the increment sizes and see if I can make the problem occur more often if I shrink the growth increments. Then at least I'll have a clear picture of what's going on...

  • looks like it could be growth.

    but i'd def look at your tempdb as well.

    maybe this needs a bigger growth factor.

    MVDBA

  • Are you sure you're not running out of log space? If you have a 1.6tb database I don't see how the 80gig log will be large enough! Maybe I'm missing something here. Also what kind of recovery model are you using? Maybe for the import you should change it to simple and then when done change it to full. That might help slightly on your log file growth during the import process.




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

  • One more thing. If you have hyperthreading turned on, turn it off.




    Gary Johnson
    Microsoft Natural Language Group
    DBA, Sr. DB Engineer

    This posting is provided "AS IS" with no warranties, and confers no rights. The opinions expressed in this post are my own and may not reflect that of my employer.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply