Slow Response on Clustered Server

  • We're having horrible performance issues on one of our database servers, a clustered server dedicated to our Wise/Lawson applications (though I can't see any reason to pin the blame on Lawson just yet). Basically, the users are complaining, justifiably, about very slow response times.

    Now, this is the same server I complained about in an earlier thread that won't allow large files to be copied to other servers on the network. (We still have that problem.) I suspect this problem is related.

    Here is what we observe:

    1. The paranoid DBA wanted first to test the culpability of the database engine, so I devised something simple that could be run outside the Lawson application in Query Analyzer. There is a table on one of the databases named EMPLOYEE, which has only about 29,000 rows in it. I ran 'SELECT * FROM EMPLOYEE'. This is not a difficult query under ordinary circumstances, but it takes about a minute and a half to run this query on the server. Ouch! That same query on a different server, same table structure and contents, takes one second.

    2. We observed that the performance degradation seems to be a function of the volume of data being queried, but — at least in text mode in Query Analyzer — that it isn't a linear degradation. (It runs faster in grid mode and the times increase more linearly. Does anyone know what the difference is in the text and grid mode connections, and why text is so much slower? I suspect it has something to do with more data in the form of spaces being sent.)

    3. Caching doesn't seem to be any help here. You can run SELECT * FROM EMPLOYEE on the same session over and over and over, but your results will always appear in a minute and a half.

    4. To play with some of the data, I created a temp EMPLOYEE table in (where else?) the tempdb database on the server, identical in column structure and number of rows to the production EMPLOYEE table. I used the following SQL statement:

    SELECT * INTO tempdb..EMPLOYEE FROM EMPLOYEE.

    5. Interestingly, it only took about one second to execute this statement. Think of the ramifications of this:

    a. Running SELECT * FROM EMPLOYEE, and returning rows to the calling application: 1 min 30 seconds.

    b. Creating a table, SELECTing all the rows from EMPLOYEE and INSERTING all the rows into the new table: 1 second.

    6. The database has to work a lot harder to execute 5b than 5a, so why is 5b ninety times faster?

    7. Subsequent executions of full UPDATE statements against the tempdb table also ran very quickly. Update statements return no rows to a client, but this one processed just many rows as the slow SELECT statements, and it also actually writes to disk. Reading from disk ought to be faster than writing to disk, not the other way around.

    8. My conclusions:

    a. Whatever the root issue is here, it is probably not inside the database, per se. I see little evidence that the database engine is not performing well, at least inside the DBMS itself. (Could this rule out disk controller issues?)

    b. The issue appears to be in whatever mechanism routes the data from the database to its client. (Would this implicate NIC cards?)

    c. Small amounts of data don't seem to cause problems, but when the volume reaches a certain point, performance takes a disproportionate whacking. There may also be a snowballing effect, wherein a large SELECT query issued by one user is holding a lot of read locks for too long, other user's processes have to queue up and wait for those locks to be released.

    d. I can hazard a guess as to what might be causing these symptoms, but so might a Druid, circa 200 B.C., have concluded that evil spirits explained his headache from too much mead. NIC card? Disk controller? I'm just not knowledgeable in that arena. But the thingamagiggy that moves the data to the client, if it is sufficiently feeble, may also be faking out the operating system in regard to copying data.

    e. My theory, not necessarily born of too much mead, is that there is a hardware glitch that causes data movement to clog up. I think the DBMS is serving up data to whatever hands it off to the client, but spends much of its time twiddling its thumbs waiting for a response.

    Any thoughts? Even idle ridicule at this point would be greatly appreciated.

    Lee

    Edited by - Lee Dise on 12/18/2003 1:34:34 PM

  • It could be the nic. It could be the cable. It could be a switch along the way.

    Are any other servers that are physically connected to the same switch experiencing problems?

    If not, try to run your select statement from one of those other servers, so that there is only one switch involved. How does that work out?

    If performance is still bad, try swapping out the network cable during down time. Or if you happen to have fancy tools, test the cable.

    Does your nic come with a diagnostic utility?

  • Thanks for the response, jxflagg. I don't have control over my own destiny here in regard to the hardware, so I'll have to pass on your suggestions to our hardware guys. We definitely aren't experiencing such performance issues on our other servers. This is our only clustered server, though, and there have been other issues with this ugly beast.

    But it sounds like you too think this is a hardware issue, not a problem with the database engine. Am I right? Or rather, can you conceive of a reason to suspect the DBMS engine itself?

    I should add, though, I experience the same slow performance at least when running from the other node in the cluster. If this is hardware, it would almost have to be something physically inside the cluster boxes, I would think.

    Edited by - Lee Dise on 12/18/2003 1:51:37 PM

  • What version of Lawson Insight are you running?

    K. Brian Kelley, GSEC

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/

    K. Brian Kelley
    @kbriankelley

  • I agreed with jxflagg, It seems a network issue. Start perfmon to monitor network interface object on both fast and slow servers by rerun same statements.

  • Knowing and supporting Lawson like I have, Lawson's hooks could really be the determining factor here. How many connections, etc., Lawson maintains and how it accesses the data is very dependent on the version of the Insight software as well as the version of the Apps. We have adopted a general rule with respect to our Lawson system: we ALWAYS look at Lawson first.

    K. Brian Kelley, GSEC

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/

    K. Brian Kelley
    @kbriankelley

  • quote:


    We have adopted a general rule with respect to our Lawson system: we ALWAYS look at Lawson first.


    From what I have seen, and I agree that Lawson's operations bear considerable watching, I ran my own benchmarks outside of Lawson in Query Analyzer, and the response times are still very slow. It shouldn't take a minute and a half to SELECT 29,000 odd records out of a table, when the same operation on a different server runs in 1 second.

  • Hence the question about Lawson version. For instance, the version of Lawson we're running with Employee self-service requires some 50 database connections to be maintained at all times. Reason? App is too slow starting the connections up on the fly. These take up resources.

    K. Brian Kelley, GSEC

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/

    K. Brian Kelley
    @kbriankelley

  • quote:


    App is too slow starting the connections up on the fly. These take up resources.


    Ahh. So you're saying the reason for the poor database response may be the hogging of SQL resources by Lawson, and I'm just witnessing the symptoms of it even in Query Analyzer.

  • That's a possibility, yes.

    K. Brian Kelley, GSEC

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/

    K. Brian Kelley
    @kbriankelley

  • quote:


    That's a possibility, yes.


    Well, now that you mention: Julie, our other DBA here, proved that every single Lawson connection repeats each SELECT hundreds of times. We asked Lawson about this, and (of course) "It's not a bug, it's a feature." Their reason was some gobbledygook about old COBOL and ISAM files, and it went in one ear and out the other. In other words, there isn't any problem here, but it will be fixed in release 8.

    I'll get Julie to post what they told her.

  • If you're on Release 7, yes, this is exactly what it does. And it really slams the performance within the database as a result. If someone runs an HR211 screen, it's probably the worst of all the code...

    K. Brian Kelley, GSEC

    http://www.truthsolutions.com/

    Author: Start to Finish Guide to SQL Server Performance Monitoring

    http://www.netimpress.com/

    K. Brian Kelley
    @kbriankelley

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply