Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

How to join a local table to a remote table by way of a linked server Expand / Collapse
Author
Message
Posted Monday, February 27, 2012 8:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, June 21, 2012 9:03 AM
Points: 5, Visits: 29
@Grasshopper - I don' usually use OPENQUERY and a linked server works better if you have multiple developers who know standard join syntax but are not as knowledgeable about OPENQUERY.

@all - I found it very useful to create SYNONYMs for the four part access path. Cuts down on the typing and can be clearer.
Post #1258322
Posted Monday, February 27, 2012 9:57 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Monday, April 07, 2014 8:17 AM
Points: 323, Visits: 1,446

As a general rule, don't do linked security in the way noted in the article as all connections will be executed remotely with this remote user regardless of the loacal user. Better to map all allowed local users to a remote login and best to use windows security locally if possible.


Not to mention you can run into the double hop security issue with Kerberos.

Ken
Post #1258375
Posted Monday, February 27, 2012 10:36 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 11:42 AM
Points: 303, Visits: 511
Nakul Vachhrajani (2/27/2012)
I believe the reason would be because OPEN queries require that Adhoc queries be enabled on the SQL Server, which, depending upon the environment and the situation, might be considered a security loop-hole.


Sorry, I don't think this is correct. Adhoc needs to be enabled for OPENROWSET and OPENDATASOURCE, but not for OPENQUERY.

When you use OPENROWSET or OPENDATASOURCE, you're basically creating a linked server on the fly, AFAICS, since the connection string is included in the arguments you pass the function. With OPENQUERY, the linked server already exists. That's what makes the first two "ad hoc" and OPENQUERY not "ad hoc", and should also be a clear indication of why the first two represent a different security risk than the third. See?

Yes, the Kerberos double-hop thing can be a bear and yes the security context has to be locked out. But, with a non-ad-hoc syntax, which OPENQUERY is, you have control over that.

For other people in this thread who feel that the fully-qualified syntax is better for developers than OPENQUERY, there are at least two reasons why this is not a great idea:

1 - even when the linked server is MS SQL, and if you decide you want to put the data someplace else, or if you just want to hit a test database rather than production, and if you have set up your linked server using MS OLE DB Provider for SQL Server or another provider (versus "Server type = SQL Server"), you can *name* the linked server to represent the data you're connecting to, rather than the server name itself, and the move is transparent to your developers because that name doesn't need to change. I realize that synonyms/aliases are another way to do it, but see point #2.

2 - you've limited the developer's thoughts about heterogenous queries to "queries between two MS SQL Servers". What happens when the HR system is Oracle or the CMS system is MySQL? Get them used to the right syntax now. As a bonus, they might not have to change much of anything in their code, when you migrate a system to a different back end.

>L<


Post #1258397
Posted Monday, February 27, 2012 10:53 AM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Saturday, March 22, 2014 8:57 AM
Points: 27, Visits: 95
Is it still that case that any join (heterogeneous to another SQL*Server or to some other DB, say Oracle) via a linked server db table and a local db table will basically ignore any relevant remote table where clause constraints and bring the whole remote table locally before then executing the constraints?

This is not obvious to track or assess, but temp space starts to fill up and hasn't always been released readily once the query finished.

The author's advice to create a local #temp table with the remote data essentially does this latter remote copy explicitly and then does the join locally.
I have avoided heterogeneous joins between linked servers and find SSIS is vastly faster to accomplish such tasks.

It might be worth having the links though for development purposes to hit up against remote tables for the sake of schema and data assessment.

The descriptions people give below of the security issues make it seem someone tedious to manage robustly with a fine granularity of privilege control.

It was a good article in terms of explaining the ways you can set up linked servers though and I want to thank the author for his contribution.
Post #1258406
Posted Monday, February 27, 2012 11:11 AM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Yesterday @ 12:40 PM
Points: 1,477, Visits: 4,281
What I typically do is execute a pass-through style query on the remote server using EXEC ('..') AT REMOVE_SERVER syntax and output the result into a temp table, which I then join with my local query. It's not necessary to bring across the entire remote table, of course one only needs specific columns and the remote resultset can be filtered by date, category, client, etc.


"Winter Is Coming" - April 6, 2014
Post #1258415
Posted Monday, February 27, 2012 11:56 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Today @ 11:42 AM
Points: 303, Visits: 511
knausk (2/27/2012)
Is it still that case that any join (heterogeneous to another SQL*Server or to some other DB, say Oracle) via a linked server db table and a local db table will basically ignore any relevant remote table where clause constraints and bring the whole remote table locally before then executing the constraints?

This is not obvious to track or assess, but temp space starts to fill up and hasn't always been released readily once the query finished.


It depends on how your query is structured. Yes, it *can* bring more data back than you might think it needs to, but not always. You're right, it's not easy to assess, except if you are working with really large tables and the perf is very good, you know you've set it up the right way to avoid this problem.

What's the right way? Unfortunately it does seem to be different for different providers, so basically I just spend a lot of time trying things out until I nail something that works. This usually includes pre-filtering the data that comes across by dynamically building the sql statement that's going into the openquery if I can instead of doing all the conditions on "this side".

That means you end up with: exec sp_executesql @mysql ... and it can be a pain to construct (although, before anybody has a knee jerk reaction to the term "dynamic", it does not add any security risk that wasn't there for some other reason already). But worth it.


The author's advice to create a local #temp table with the remote data essentially does this latter remote copy explicitly and then does the join locally.
I have avoided heterogeneous joins between linked servers and find SSIS is vastly faster to accomplish such tasks.


If you can limit the way the data is brought across, this isn't going to be much faster and it might be slower if you're dragging a lot of data. Which is not to say I don't do it <s>. But often, if the data doesn't need to be "live", instead I'm going to bring it into a *non* temp table, IOW keep it on "this side". And that goes for some heterogenous systems but also some MS SQL-to-MS SQL scenarios.

This brings up one interesting small bonus of having used the linked server in the first place: suppose you start with the linked server, and then you realize you want to cache to avoid the perf isues. Suppose there are developers going against the linked server, and you now want them to go against the cache. If you use a linked server using the OLE DB Provider for MS SQL, as I indicated earlier,you can actually set up a linked server to itself, pointing at your new cache of the data. This is one way to transition developers with a lot of integration code already in play.

Later, of course, you can write more elegant interfaces by massaging the cached data according to the integration(s)'s needs and you can move the developers to use these even-more-efficient schemas at their own pace. But the "raw" cached data -- matching the original source -- is still there in your staging tables, and they can access it directly for as long as they need to.

In my mind, if I am caching or replicating data, whether I use SSIS to do it is not material to the real perf gain I'm going to get; sometimes I use SSIS, sometimes I don't. But in cases where I *must* have the live data, usually SSIS processing is just going to get in the way.

>L<
Post #1258446
Posted Monday, February 27, 2012 12:10 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, June 21, 2012 9:03 AM
Points: 5, Visits: 29
Regarding using SSIS to acquire data the ease is only proportional to the cooperation of the upstream provider and the immediacy of the data is only as valid as the process that provides it. It has taken 3+ years to create a working relationship with our upstream provider and the politics were much more important (and Machivalian). Also theirs is a 'data warehouse' and created each nigh such that the data is actually 24+ hours old when we get it. It is good to remember that politics sometimes trumps technical priorities.
Post #1258460
Posted Monday, February 27, 2012 7:45 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, August 10, 2012 6:08 PM
Points: 1,156, Visits: 801
Excellent article on setting up a linked SQL server. Could be interesting to cover some links to other popular DB sources just to round out the topic.

Only one item stuck out at me as very misleading:

"The result of executing this query is the same as if all the tables were on the same server, except that the query consumes more server and network resources."


Yes, more resources are in fact used, but the "result of the executing query" is far from the same. The local and foreign server do not share the same memory space, nor temp DB space. The local platform has little to no reference of keys and indexes that can be applied on the foreign source, and has to leave this up to the foreign server to sort out (unless the dev intentionally builds to use them). This usually results in all rows of the foreign table being ported over (at least for the columns requested) into local memory/tempdb space before joins and criteria are applied, usually in an unindexed fashion (read: Full Table Scan).

When the tables are on the same server, indexes may be utilized without even directly reading the table in question.

When it comes to directly queried foreign data, the bulk of latency is nearly always "over the wire", involving multiple processors and I/O paths (minimum two sets), packing/unpacking, and then data to memory/TEMPDB.

Use of *pre-filtered* TEMP tables is highly recommended. Filter what you want as much as possible and port that into a temp table to be joined upon.

I've made a living converting old multi-foreign server join PROCs into a fully temp-table-driven paradigm, usually dropping execution time of very large data sets substantially, in some cases from hours to a couple of minutes, and in others from minutes to a few seconds. The front of the PROC gets set up with all of the necessary filtered temp tables, and then the joins are made. Rarely is it ever advisable to use local data joins to any type of foreign server directly in the FROM clause. A few minutes' saved expediency (per join) is rarely excuse for leaving a known time-bomb such as direct foreign-server joins of any type, including linked servers.

The same goes for creating a linked server to a DB that is actually local. I don't believe the parser in most if not all of the current major DB systems will circumvent a link to query the indexes directly before deciding what to request, and the only time saved in this limited case may be that the data doesn't have to packetize over the network... although I'm certain that can vary depending on the type of foreign link / linked server used at that point.

The way I usually describe Linked Servers to end developers is by comparing it to MS Access and linking to ODBC connections and then querying by joining across those connections. SQL Server may have larger memory and file size allocated to it, but in principle the overall process is basically the same.

The use of SSIS has some caveats and some significant overhead, both in development and maintenance/changes. A steady staging environment is definitely stepping up the architectural design of a data system, and greatly desired (I recommend them for all first-base reporting solutions) to keep large reporting queries off of the live OLTP server(s). Being in the ETL space, SSIS (and all other staging technologies) also do not get the proverbial "world" joined directly to them either, and the user-security risks are far fewer than Linked Servers. But these do not come close to outperforming pre-filtered temp tables, especially when immediate-live data is required.
Post #1258589
Posted Monday, February 27, 2012 8:15 PM


SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Saturday, March 22, 2014 8:57 AM
Points: 27, Visits: 95
The key point being to build the temp table, i.e, the reduced size result set on the remote box then reference that table. Still problematic if the results are driven by elements locally. meaning you don't really know what to pull remotely until you join to the local table. You are right about reduced column sets. But then you are limiting things too, and at that point you really are talking about the actual functionality required.

I think what it really means is that there is not a really good heterogeneous join system out there, or at least not in SQL*Server though it is probably good enough for most things. You could design a minimal set of data transfer that is pretty close to the least amount of time, space, data transfer possible. But, it gets hard to balance lots of divergent requests. It is like a human has to be the cost optimizer because the built in optimizer thinks fairly simply. Yet the fix could be rule based and so dynamic sql could build itself.

Is there a contest out there to build the fastest, least packet hungry heterogeneous (throw in most secure) join across some well defined set of hardware/software? and if so, what is the prize? I propose the solution expands bandwidth on the internet as well as moves data across it. Which after all is sort of the same thing...
Post #1258597
Posted Tuesday, February 28, 2012 9:08 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, March 01, 2012 7:29 PM
Points: 1, Visits: 20
Don't have any experience with linked servers and wonder if anyone could clarify for me a question on the article's script? The article states:

--POPULATE A TEMPORARY TABLE WITH DATA FROM THE LINKED SERVER

SELECT StateProvinceID,StateProvinceCode
INTO #temp
FROM [SERVER02].[AdventureWorks].[Person].[StateProvince]
GO

--JOIN HOST SERVER TABLES WITH THE TEMPORARY TABLE

SELECT a.FirstName,a.LastName,b.AddressLine1,b.City,c.StateProvinceCode
FROM [SERVER01].[AdventureWorks].[Person].[Contact] a
INNER JOIN [SERVER01].[AdventureWorks].[Person].[Address] b ON a.ContactID = b.AddressID
INNER JOIN #temp c ON b.StateProvinceID = c.StateProvinceID
GO



In the above code the temp table #temp is being created on the local server Server01, right? Not on Server02, so the data is being moved to the local first?

Also, in the second peice of code where the local/host server tables are joined to the temp table, presuming the temp table is on the local host server01 would the query really have to have the fully long justified server names (where SERVER01 is given)? Couldn't the syntax just start with [AdventureWorks].Person.Contact or even Person.Contact?

Thanks for your help!

Post #1259241
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse