Performance problem with Views defined against Linked Server data sources

  • I have a problem with a data retrieval query that joins three SQL Views, each view based upon a remote table via a Linked Server object. The problem is that the query, a simple 3 table (view) join, that returns the top 200 rows, is taking over 3 minutes to run, a variation with a larger table takes 12 minutes.

    The SQL server on which the Views are defined is linked to a Progress Database server, the Progress database on this server is also being used by a mission critical OLTP system, so there maybe some resource contention.

    Each view is defined in a similar way, like this :

    CREATE VIEW [VIEW_A] AS SELECT * FROM OPENQUERY(REMOTE_SVR,’SELECT * FROM [REMOTE_TABLE]’)

    The REMOTE_SVR referenced above is the name of the Linked Server object, its connection to the actual remote server is via an ODBC driver.

    I have looked at the query plan generated by the SQL Server and it is showing that 3 Remote Scans are taking place and appear to be bringing the full contents of the 3 remote tables across the network to be processed on the SQL Server. Where they are placed in a worktable before the join is performed.

    Is this correct? Is it the case that the Views are nothing more than a direct connection thru to the associated tables on the Progress Server, and so for SQL Server to action the join it must first pull all the rows from each of the 3 tables on the remote (linked) server? If this is the case I’m guessing the performance hit on the remote server and the network are potentially very significant. Not to mention the slow query itself, and the hit on the SQL Server resources.

    I would really appreciate some comments, hopefully to confirm if I’m right or wrong in thinking this is bad news. I am not currently in a position to redesign this setup as it is provided by a 3rd party, but If I can show the design is inappropriate then I will have more influence, and hopefully be able to redesign to a data warehouse. Confirming my thoughts is the first step.

    Thanks in advance

  • yeah that's one of the problems with a linked server; data gets copied over to tempdb, and then joined to your data locally, so you could see a million+ row tables copied over, which crushes performance.

    What I've always tried to do is use dynamic sql to build a command that would limit the data fromt he remote server, and use EXECUTE AT or OPENQUERY with a WHERE clause in it.

    so for example, say i know i'm joining my local NewInvoices to MyLinkedServer.RemoteDatabase.dbo.RemoteTable.

    in my case, i want to use my local table NewInvoices to filter the remote table somehow: maybe a PK greater than a certain value, maybe data that is in a date range, or data that occurs on specific dates...SOMETHING to build a query that returns a lot fewer rows.

    here's one example where i get just the specific Dates from a table:

    DECLARE declare @Dates varchar(max) = '';

    SELECT @Dates = s.Datez

    FROM(SELECT

    Datez = STUFF((SELECT ',' + CONVERT(varchar,CPUDateOfService, 111) --111 = converting to yyyy/mm/dd format

    FROM [#NewInvoices]

    ORDER BY CPUDateOfService

    FOR XML PATH('')

    ),1,1,'')

    ) s;

    Then i use that to build a dynamic query, which i'll execute on the remote server(sticking the results into a local temp table)

    --FOR XML trick created comma delimited list of dates

    --now using EXECUTE AT to do the work on the remote server, build the command

    SET @cmd = 'SELECT * FROM

    FROM RemoteDatabase.dbo.RemoteTable T2

    WHERE T2.PostDate IN (SELECT convert(datetime,fn.value) AS val FROM RemoteDatabase.dbo.DelimitedSplit8K('''

    + @cmd + ''','','') fn )'; --splitting the comma delimited list of dates

    then, instead of millions of rows, i get a lot fewer back via EXECUTE AT (implied resutls table i did not botehr to define)

    --important performance note on Linked servers:

    --without EXECUTE AT, the entire contents of DMMEMBR/DMTRANS get copied over to local [tempdb], and only THEN is the data filtered/joined

    --previously this resulted in 4.5 Million DMMEMBR rows + 41.4 Million DMTRANS rows being copied, when we actually need less than 10K rows

    INSERT INTO #RemoteResults

    EXECUTE(@cmd) AT [CFS-SERVERSQL]; --linked server

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks for your reply, its very helpful. I take your point on pushing the workload onto the remote server where possible, and thus reducing the returned traffic, that makes sense. But in my case, as the views are defined with a Select * that's not currently possible. Hence my reservation about the whole 3rd party solution.

    Thanks again:-)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply