Weird error: MSDE Database stops responding

  • I have a very strange problem, I have an application that continually collects production information from the factory floor. It will run flawlessly for a few days, and then the database (MSDE 2K) will stop responding. This will last for about 2 hours and then it will start back up and work without any further issues.

    There is nothing in the server logs or event logs indicating a problem. My application error logs show this snippet:

    (format: date - time(millisecond)(section) error)

    1/18/2005 - 170029(531)(MDT Server)Error in Machine HistoryUpdate(Local), details follow:

    -2147217887 Msg: IDispatch error #3105 Description: Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.

    1/18/2005 - 170029(656)(Local) Error (open Local DB COnnection): -2146825023 Msg: Unknown error 0x800A0CC1 Description: Item cannot be found in the collection corresponding to the requested name or ordinal.

    SQL: select pStdCycleTime as 'cyc', pStdPartsCycle as 'ppc', pPartsCavity as 'pcav' from MachineProductDetail where pmid = '019' and pmcu like '%1002'

    1/18/2005 - 170029(734)(Local) Error (open Local DB COnnection): -2146825023 Msg: Unknown error 0x800A0CC1 Description: Item cannot be found in the collection corresponding to the requested name or ordinal.

    SQL: select * from mdtParams where pmdt = '019'

    on and on... It appears as though the db connection succeeds (I am testing this) but cannot execute the sql commands.

    The curious thing is, the application performs all of these operations during each iteration, and the data is relatively consistant. I could understand if the data was malformed or otherwise but this is not the case. The fact that it runs for upwards of a day before the problem occurs indicates something is wrong with the database.

    There are no maintenance tasks being performed, no internal stored procedures, or triggers firing. I can find no reason why this occurs.

    Can someone suggest another method of finding the problem? It is not something that I can readily replicate or anticipate (at least from what I can see).

  • Does this error occur during an update or a select?

    It looks like you are using ADO.  The one thing I hated about ADO (I use .Net now) is that errors sometimes are very misleading.  I was going to suggest looking for a trigger, but it seems you have ensured there is no trigger on the table.

    Look at field lengths, as well.  I have encountered errors when I am updating a field with data that is either larger than the column's length or not compatible with it's data type.  Also, log that actual data you are trying to update in your log file.  It may show something subtle.

    Take a look at your connection string.  Are you opening a specific catalog?  Does each iteration open and close a db connection?

  • Yes to all your questions.

    The error almost ALWAYS starts with the machinedatahistory table. I do a select on the table and then an add/update.

    I open and close the connection string after each series of operations to ensure that the connections are open and available. I even test the connection prior to doing any update. It appears as though the connection opens but fails to get the specified data for the operation (returns errors indicating a missing name or ordinal).

    All of these operations could process for thousands of iterations before the errors occur, and then stop for about 2 hours, every time it is the same length of time also. I do collect all of the data that is being processed and log it to a seperate log file. There does not appear to be anything out of the ordinary. The select statements are listed in the log files as well and they are normal, I test them afterwards in query analyzer to verify them.

    Argh!!!

    One thing to note... I set the max memory allocation size limit to 150MB. Could this be causing me problems? I have 512MB system memory available, maybe setting this to 256 would be better.

  • Check to see if you have any backup or maintenance jobs.

    I had a problem once where sometimes an update job would die at 1:30am and never finish.  Other times, it did.  I could not figure it out until I checked the task scheduler.  Apparently, the server was scheduled to reboot at 1:30am every morning.  My job was scheduled to run at 1:00am.  Sometimes it got done in time, other times it did not.  It's one of those problems you lose a lot of hair over and realize had you seen it earlier, you could have saved yourself a lot of grief.

    Make sure the server is up during these times.

    I am not sure if memory would be a cause.  I would think if an upper memory limit is hit, SQL Server is smart enough to queue connections and query requests while it cleans up memory instead of falling apart.

    Also, if you close the connection, you still must release the object from memory.  Otherwise, you will continue to create new connections (ie cn = new Connection) without releasing the previous connection.

    Are you in VB or C++?

  • All maintenance is being done on weekends when production is offline and NO jobs are scheduled during the week. I turned everything off to eliminate them as potential problems.

    The server is online 24/7, and we run production 24/6. My close function is as follows (VC++). RS1 is the recordset and RSCon is the connection. Written per the samples in MS-ADO V2.6 Help:

    bool closeDbConn(bool& running)

    {

     bool val = false;

     try{

      if (RS1->GetState() == 1){

       RS1->Close();

       running = running;

      }

      if(RSCon->GetState() == 1){

       RSCon->Close();

       conSt = NULL;

       running = running;

      }

      val = true;

      running = false;

     }

     catch(_com_error &e){

      _bstr_t bstrSource(e.Source());

      _bstr_t bs =  _bstr_t(" Error (close local DB Connection): ")

       + _bstr_t(e.Error()) + _bstr_t(" Msg: ")

       + _bstr_t(e.ErrorMessage()) + _bstr_t(" Description: ")

       + _bstr_t(e.Description());

      char errmsg[512];

      strcpy(errmsg, bs);

      writeLogData(dbSource, errmsg, "errorlog.log");

     }

     return val;

    }

  • Just a few brief thoughts shooting from the hip:

    1) is the MDAC up to date ?

    2) up the memory to 256 Mb (256 Mb is plenty for windows with MSDE and no other applications).

    3) Look into upgrading to a full version of SQL Server because the MSDE engine is 'throttled back' some from the full blown product.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Thanks to all of you for your input on this.

    1.) Yes, we re-installed and updated all components

    2.) Did this yesterday, the problem still occurred last night.

    3.) Wish I could. If it were not for this simgular event, the MSDE has worked perfectly for the application.

    I am looking closer at the code base. I did see one irregularity after pouring through the log files last night. I am thinking it is a result of a user procedure irregularity. I must be holding a connection open somehow under a very specific condition, once I find it the problem should go away.

    Thanks to all for your help.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply