Slow query performance extracting large tables from off-site linked server

  • I know that this scenario is going to inherently be slower than anything done locally, but my options are fairly limited. First, a bit of background:

    I work in Healthcare IS for Company A, but Company B is hosting one of our EMR programs for us. This was done on purpose, so that whether a patient is seen at one or the other, their medical history is more complete. However, this puts all of the data that I need to get to on a server across town that I can only access via Sql Server Management Studio as a linked server.

    Now, in some ways, the performance has been better than I expected, but sometimes it behaves very erratically. I am using OPENQUERY to handle all of the pulls, and am not joining to any local tables, in order to maximize efficiency.

    Here is some of the code I run, and what happens:

    SELECT *

    FROM

    OPENQUERY([linkedservername],

    '

    SELECT *

    FROM Encounter_ItemChild

    WHERE EncounterID IN (SELECT ID FROM Encounter WHERE DTTM BETWEEN ''2014-09-30 00:00:00.000'' AND ''2014-10-01 00:00:00.000'')

    '

    )

    ***This is the main query in question. Sometimes, it just spins indefinitely until the timeout is reached (30 minutes). Other times, it finishes in < 1 minute.

    SELECT *

    FROM

    OPENQUERY([linkedservername],

    '

    SELECT ID

    FROM Encounter

    WHERE DTTM BETWEEN ''2014-09-30 00:00:00.000'' AND ''2014-10-01 00:00:00.000''

    '

    )

    ***During times where the first query spins, running this returns all IDs in a few seconds (around 60k records)

    SELECT *

    FROM

    OPENQUERY([linkedservername],

    '

    SELECT *

    FROM Encounter_ItemChild

    WHERE EncounterID IN (123456,234567,345678,456789,....)

    '

    )

    ***The above query was programmatically generated by taking the IDs from the second query, and packing as many into the IN condition as possible. Each statement could hold only about 900 IDs, so around 70 queries get built...however, each one returns the records in question in 1-2 seconds.

    My main question is...if the second query pulls all IDs from Encounter in a few seconds, and that query is used in the first query's WHERE clause, why does it spin and spin, while manually throwing the IDs in instead runs almost instantly?

    Also, I am totally open to ideas for other ways to handle this pull.

  • Try to use JOIN instead of IN:

    SELECT *

    FROM

    OPENQUERY([linkedservername],

    '

    SELECT ic.*

    FROM Encounter AS e

    JOIN Encounter_ItemChild AS ic

    ON e.ID =ic.EncounterID

    WHERE e.DTTM BETWEEN ''2014-09-30 00:00:00.000'' AND ''2014-10-01 00:00:00.000'''

    )

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene, thank you for your reply.

    I should have mentioned, I have attempted both using the JOIN and IN, with no difference to speak of. Perhaps you could chime in, but I read somewhere on SE that if the value you're INNER JOINing on is unique, then it has the same execution plan as the IN variant.

    It looks like today is another slow day, as running one load after another is showing most of the larger loads are taking 4-5 times longer than average. I'm stumped on this -- being that this linked server is shared among all of the users here, do we all share the same bandwidth pool? I can't be sure if the slowness is on the remote server side or in the throughput, and I'm not entirely sure how to confirm either.

  • Do you have a technical contact on their end you can get to run your first query locally and see if it lags on their end as well(if so it could be an issue with their system)?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply