SQL 2000 SP4 issue

  • Hi there.

    I have an issue regarding one of my SQL Servers. This server was one of the last servers in our environment to be upgraded to SP4. It also has a fair amount of ugly code, that I know shouldn't be there, that pushes and pulls information over linked server connections.

    The issue is that the environment ran fine as a legacy system on SP3a. However, now that SP4 is installed, cross server query plans perform MUCH worse then before (ie, 10 minute jobs take 9 hours now).

    Has anyone else ran into this problem?

     

  • Haven't had the problem myself, but I'm curious to know if the other servers in your cross server query are SP4 or SP3.  if they are SP3 do you have the same degradation if you use all SP4 servers?

    -

  • Hi Jason.

    Thanks for the reply. The question you ask is a great question and one of the first things we experimented with. The problem is more then a simple SP4 issue. We've tried SP3 to SP3 (worked well). We've tried SP3 to SP4 (worked well). We've tried SP4 to SP4 (worked ok). We've tried SP4 to SP3 (worked ok). So, a good question would be "What's the problem then?"

    Well, that is a good question. The problem is isolated to one server and started as soon as SP4 was installed. So this isn't a specific SP4 problem, but rather an SP4 problem on one server. The query plans are distinctly different on that server vs our other servers so it is a cross server optimization issue on that server alone starting at the time that SP4 was installed up from SP3a.

    It's a head scratcher. I plan on placing a service call to MS, but if anyone has any ideas at all, I'd love to hear them.

    Thanks.

  • You can try a dbcc dbreindex (if the query plans are changing for worse performance, this sometimes helps.)

    -

  • Yeah. That's a good suggestion. I've tried everything I could think of in those lines. DBCC reindexes, using "update statistics" on all tables affected. sp_recompile on affected tables, etc. Something is happening that I can't figure out. For some reason after the SP4 upgrade, the optimizer on this server started making inefficient plans that, as far as I can tell, is only negatively affected when the queries span servers via linked server.

    Knowing the issue has to do with optimization is a starting point. Knowing what to do about it? I'm just stuck on this one. I'm also surprised that I've found no documentation on this happening to anyone else.

    The only info I have about the SP4 upgrade related to this server is that there was a database left in suspect mode (this is not because of a true problem but because it's an insignificant DB that we didn't restore after doing some physical file moving) at the time of applying the SP. A warning came up saying that the suspect DB will not have the SP applied, but then allowed the install to go smoothly. The DB mentioned doesn't come into the picture and is still simply sitting in suspect mode as it was before the SP.

    Other then that, there was nothing special about this SP4 upgrade.

    Thanks again for the suggestion.

     

  • Hi, GMan.  Couple questions.

    What is the difference in the query plans? 

    Is the data the same across all environments?

    Are your linked servers all set up identically?

     

  • Hi Jeff.

    To answer your last two questions, the answer is "yes", that was a part of the test plan. The problem was evident when code that had been working suddenly stopped working well. Hence variances in setup were not a factor in initiating the problem. However, to ensure the tests were valid, the answer is "yes" to both questions.

    I'm not sure what the best way of sharing the two plans is other then to describe this:

    The plan that doesn't work, tries to pull a lot of data from the linked server so it can work on it locally.

    The plan that does work, allows the linked server to do most of the work.

    I can't figure out why one server would come up with an entirely different plan.

     

  • That is a puzzler.  I can't think of any reason why the optimizer would choose differently if the environments are exactly the same.  Maybe a missing index?

    Can you post the results when you get it figured out?

    Thanks,

    jeff

     

  •  

    I'm with you Jeff, this is most definitely a puzzler. There are no missing indexes. Both in production and in all my tests there is one table locally (maybe 5000 rows) that has a join happening to a table from a linked server (much larger, more like a million rows). All of my tests go against the same linked table and replicating my small local table with proper structure/indexes wasn't an issue. Not to mention, suddenly, after SP4 was installed, linked server optimization simply stopped working well for everything.

    I'm fully aware that a valid response would be "change the code/process", however that's not the nature of my quest. The code was developed by developers and has been in production for years. It's evident that the code/processes could change, but that is a significant under taking. Hence I have a desire to figure this out from a technical point of view and, at minimum, get things working long enough to consider a project like rewriting the code.

    My guess is that this is "a Microsoft thing". I wouldn't be surprised if I have to do something stupid like uninstall SQL Server, reinstall SQL Server, apply SP4, and reload my system db's. However, I'm not going to do that to a production server simply out of experimentation unless I have a good reason to do so (ie, someone had the same problem and said this absolutely fixed it for them).

    If I find an answer, I'll most definitely post it. If anyone else has any ideas, I'd be happy to hear them!

     

  • Could this be the issue?  http://support.microsoft.com/kb/919165  I'm looking into a similiar problem my 2000 server is 8.0.0.2039

     

  • What does the query plan look like? Any chance that parallellism is killing performance?

     

     


    N 56°04'39.16"
    E 12°55'05.25"

  • I was wondering what factors the optimizer might use when deciding to use local or remote data in a query involving a linked server, having suffered collation problems before when linking servers, I was thinking it might possibly be a collation issue. I decided to see what i could find on the internet

    I found this which might be of use..

    http://msdn2.microsoft.com/en-us/library/aa178113(SQL.80).aspx

    Included in this article is the following:

    Collation compatibility

    • For a distributed query, the comparison semantics for all character data is defined by the character set and sort order of the local SQL Server. Microsoft SQL Server 2000 supports multiple collations, which can be different for each column; each character value has an associated collation property. SQL Server 2000 interprets the collation property of character data from a remote data source and treats it accordingly. For more information on the collation of remote columns, see Collations in Distributed Queries.

      SQL Server can delegate comparisons and ORDER BY operations on character columns to a provider only if it can determine that:

      • The underlying data source uses the collation sequence and character set of the column.
      • The character comparison semantics follow the SQL-92 (and SQL Server) standard.

    David

     

    If it ain't broke, don't fix it...

  • First of all, I'd like to thank those who replied. I really appreciate the input.

    David - Thank you very much for your note. That is a fantastic point and I thank you for it. It was, however, one of the first things I thought about looking at so I was able to rule it out early.

    I think I've mentioned this already, but interestingly enough, all of the queries that were performing poorly were queries that I would have declared against best practices. I chose not to focus too strongly on that point because, after all, it was performing acceptably for a significant time frame. Solving this problem was going so slowly that I was faced with needing an action plan, so, the offending code has either been migrated to a different server or rewritten to something that worked. At least this removed any state of crisis some were feeling.

    This didn't change the fact that *something* changed on that server after SP4 was installed that caused those queries to work so poorly, so I want this solved. Hence, I've paid for a support call to Microsoft. There were some interesting registry entries found, and MS has told me that SP4 does indeed recompile the entire optimizer process of SQL Server. Removing the registry items helped make my query plans look correct, however, interestingly enough, the queries still don't perform well.

    I've done some diagnostic data collection on our servers for MS and they are in the process of analyzing it. I was updated today that they have a theory that the problem might be what "GWired" mentioned requiring the installation of the hotfix. I have a test server I can try the hotfix on first to gauge the results.

    If I get resolution on this problem, I will post it in it's entirety. In the mean time, should anyone have information regarding this problem, I'm all ears.

     

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply