Linked Server Optimization

  • Comments posted to this topic are about the item Linked Server Optimization

  • Nice article and I agree with you on the performance factors, but also would like to add that there are occasions like when you need real time data(read reports) from db located remotely, options are limited.

    Like recently, I got an assignment of retrieving data from db residing in many sql express clients located remotely, infact nearly 100's of clients and need to pull data into a centrally located sql server for real time monitoring using data retrieved.

    Any thoughts?

  • Ouch - sounds like a horrible setup :w00t:

    Are these remote SQL Express clients accessible via static IPs? Do you pull the data from the central SQL server or push from the clients? Unless these answer to these are "yes" and "pull" you'd probably be better off with merge replication from the clients.

    Regards,

    Jacob

  • Correct, horrible setup......

    And right the first thought that came in my mind was to do merge replication (i am not actually a dba but .net dev) and learned sql express cannot be a publisher for merge replication and hence wil not work out.

    Yes these SQL clients are accessible via statis IP's, the current setup is i have written SP's for the required report in centralised server and scheduled it to run every 5 min (latency of 10 min was approved, as app that updates the db in sql clients happens in a period of 5 min), ya so its kind of pulling from client.

  • Ah yep, sorry SQL Express 😛 I was reading MSDE...

    If you can't change the architecture then minimise the amount of data transfered across the linked server boundary. If at all possible (as the main article mentions) avoid all joins between linked servers - I've found the best performance is when you use remote exec a proc on the remote server and insert the results into a table on the local side with INSERT... EXEC... as the query is entirely resolved on the remote side and the results only come back to the local server.

    This of course requires the remote proc to be almost self-sufficient - I have no idea of your schema but if the remote clients use tables that share a common schema with the central server if may be possible to have a remote proc which takes a single parameter (eg max(primary key value) from the central table) and returns all rows from the remote (client) table that are higher than that value, or similar. Or even better, if you are inserting into seperate tables from each client and have your data partitioned by something simple like date, no parameters need to be passed at all.

    Regards,

    Jacob

  • Thanks for your suggestions Jacob....

    Yes, you are right schema happens to be different for the clients (the application that writes into db is definitely not a scalabale one coz they cant figure out a generic structure), mainly because customer select the required fields to be monitored using the app. But still i will be checking this option of dumping (only latest/last inserted) records into tables in central server with a remote proc and later join the tables for the report.

  • You should understand what kind of data (and amount) you have in both servers (local and remote) when you're using linked server.

    Sometimes you'd like to run the join on the remote server and "get" the result only (if local table is small and the join will eliminate records for example).

    For this case use the REMOTE join hint.

    From BOL:

    "REMOTE:

    Specifies that the join operation is performed on the site of the right table. This is useful when the left table is a local table and the right table is a remote table. REMOTE should be used only when the left table has fewer rows than the right table.

    If the right table is local, the join is performed locally. If both tables are remote but from different data sources, REMOTE causes the join to be performed on the site of the right table. If both tables are remote tables from the same data source, REMOTE is not required.

    REMOTE cannot be used when one of the values being compared in the join predicate is cast to a different collation using the COLLATE clause.

    REMOTE can be used only for INNER JOIN operations."

    Roi Assa


    Kindest Regards,

    Roi Assa

  • Good article, I would have liked to seen the exploration of a linked server in a openrowset subquery where you can have the where clause logic run on the remote server thereby returning less rows to the SQL Server. I have always felt this was a good way to improve linked server query response but I have never tested it agianst the 4 part named query or the temp table.

    Steve
    http://stevefibich.net
    http://utilitydb.codeplex.com/

  • I have, in the past, created views on the local server that contained the required remote data. Although you don't get quite the performance gain from copying the data down from the remote server, it allows you to maintain a granular security structure at the remote location, and you do acquire some performance gains over direct access. From a security standpoint, you only expose that data that needs to be exposed. One "gotchya" on this arrangement is that you now expose your local architecture to vulnerabilities dependent on the schema consistancy of the remote server(s). If someone changes the remote table schemas without notification / propegation of the changes, things on the local server . . . ah . . break.

  • Check out openquery() to access data on linked servers. It will return only the results of the query after processing the query on the remote server.

  • We had the same problem and I used a table-valued function to limit the data being transferred. In this way, the join was on the function and with a smaller results set being transferred over the network. The speed increase was huge in my case.

    Ex. (from my bad memory)

    Select ...

    From vwLocalTable l

    Inner Join fnGetRemoteData(l.filterVal) r

    on l.key = r.key

    where l.filtercol = "something"

    We put this whole thing in a view to help to abstract the hole problem with linked server specs in sql.

    Hope this helps.

  • The situation described is our production architecture. We deal with this issue on an ongoing basis and have had various successes and failures.

    The issue, as we have defined and observed it, is that when you join in a table on a remote server the remote server returns the entire dataset to the requesting server and then that dataset is optimized into the local query. So if I join in a table with 50,000 rows, even though in the ON clause I have filtered to only needing ONE row, the remote server will return a dataset with all 50,000 rows and then the local server will use the dataset in an optimized local query.

    So the solutions offered do address the underlying question: How do I get the remote server to only return the rows that I actually want, instead of a complete dataset of the entire table?

    Temp tables, OPENQUERY, etc are fine IF you have the filter criteria when you create the temp table or OPENQUERY. But if the filter criteria for the remote is derived from the values of several other columns from several other tables in the query itself, then temp and openquery crumble away as my filter criteria from columns in other tables increases. My understanding is that the remote server is returning a temp dataset any way, so this can only be more effective if you have and can use filter criteria.

    So what's a developer to do!??!!??

    Here is what we do that has proven results. We use a join clause with a sub-select statement. In the example below TBLA is local and TBLB is remote.

    Example:

    Select TBLA.cola, TBLA.colb, TBLA.colc, TBLB.cola, TBLB,colb From TBLA

    inner join (Select cola, colb, colx, coly, colz From Remote.TBLB_Remote) TBLB

    On TBLB.colx = TBLA.cola and TBLB.coly = TBLA.colb and TBLB.colz = TBLA.colc

    When you use a join with a sub-select statement the remote query is filtered by the criteria in the ON clause of your join and instead of the remote sending back the entire dataset of the table, it only sends back the rows you actually need, but of course it makes a trip to the remote server for every row in TBLA.

    If you can put a where clause on the sub-select to further limit the dataset, even better!

    If your remote server is colocated in the data center with the local server (as in our case) and/or you have sufficient WAN bandwidth, this method is very efficient.

    If the local server is located in Atlanta and the remote is located in Chicago and you have a DSL between locations, this would not be good. But nothing would be good in that case! 😀

    In essence you must choose between the load of a large amount of data being transmitted across the WAN against multiple trips with a small amount of data. Usually the latter wins.

    Also if you NEED the majority of the rows in TBLB, go ahead and do a regular join and get them and let the local server optimize the remote dataset into the local query. In other words, if I am selecting 50,000 rows from TBLA and I have one or more matching rows in the remote TBLB for every row in TBLA, go ahead and get all the rows from TBLB at one time because your going to need them. In our experience, this is not usually the case.

    Just one last point. Our development environment is not distributed. We take all the databases from all of our SQL servers and put them on a single development SQL server. Using the method I described above, when you are not dealing with remote, actually makes the query run slower when you are not in a remote environment. This makes the efficiency of new development queries that are intended to work cross-server difficult to measure.

    Just another option to put in the toolbelt!

    Well, nuff said.

    Donald

  • I think the linked servers are not the only culprit here. I actually cringed when I saw these queries.

    The first query is JOINing and WHEREing using fuctions. And those functions will cause scans, right ?

    And the second query uses a SELECT INTO. We have seen TempDB access slowed down and even locked up by using SELECT INTOs.

  • OpenQuery is a good way to go as well but this requires that this method is opened up to more than just the sysadmin role, something that we may not want to do for all users. Also you have to make sure that the OLEDB provider being used has OPENQuery enabled for non admin roles.

    Steve
    http://stevefibich.net
    http://utilitydb.codeplex.com/

  • The thing that worries me with Linked Servers is the security aspect.

    Our SQL2000 boxes won't delegate but our SQL2005 boxes work just fine.

    We have trawled through just about every obscure knowledgebase article on delegation to find out why.

    If you can't get delegation to work and use a fixed account then you are going to have a security risk with anyone who can log onto your local box. they can do whatever the linking account allows them to do on the remote box!

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply