Excellent article on setting up a linked SQL server. Could be interesting to cover some links to other popular DB sources just to round out the topic.
Only one item stuck out at me as very misleading:
"The result of executing this query is the same as if all the tables were on the same server, except that the query consumes more server and network resources."
Yes, more resources are in fact used, but the "result of the executing query" is far from the same. The local and foreign server do not share the same memory space, nor temp DB space. The local platform has little to no reference of keys and indexes that can be applied on the foreign source, and has to leave this up to the foreign server to sort out (unless the dev intentionally builds to use them). This usually results in all rows of the foreign table being ported over (at least for the columns requested) into local memory/tempdb space before joins and criteria are applied, usually in an unindexed fashion (read: Full Table Scan).
When the tables are on the same server, indexes may be utilized without even directly reading the table in question.
When it comes to directly queried foreign data, the bulk of latency is nearly always "over the wire", involving multiple processors and I/O paths (minimum two sets), packing/unpacking, and then data to memory/TEMPDB.
Use of *pre-filtered* TEMP tables is highly recommended. Filter what you want as much as possible and port that into a temp table to be joined upon.
I've made a living converting old multi-foreign server join PROCs into a fully temp-table-driven paradigm, usually dropping execution time of very large data sets substantially, in some cases from hours to a couple of minutes, and in others from minutes to a few seconds. The front of the PROC gets set up with all of the necessary filtered temp tables, and then the joins are made. Rarely is it ever advisable to use local data joins to any type of foreign server directly in the FROM clause. A few minutes' saved expediency (per join) is rarely excuse for leaving a known time-bomb such as direct foreign-server joins of any type, including linked servers.
The same goes for creating a linked server to a DB that is actually local. I don't believe the parser in most if not all of the current major DB systems will circumvent a link to query the indexes directly before deciding what to request, and the only time saved in this limited case may be that the data doesn't have to packetize over the network... although I'm certain that can vary depending on the type of foreign link / linked server used at that point.
The way I usually describe Linked Servers to end developers is by comparing it to MS Access and linking to ODBC connections and then querying by joining across those connections. SQL Server may have larger memory and file size allocated to it, but in principle the overall process is basically the same.
The use of SSIS has some caveats and some significant overhead, both in development and maintenance/changes. A steady staging environment is definitely stepping up the architectural design of a data system, and greatly desired (I recommend them for all first-base reporting solutions) to keep large reporting queries off of the live OLTP server(s). Being in the ETL space, SSIS (and all other staging technologies) also do not get the proverbial "world" joined directly to them either, and the user-security risks are far fewer than Linked Servers. But these do not come close to outperforming pre-filtered temp tables, especially when immediate-live data is required.