NOLOCK across linked server with join

  • I'm with everyone else, using NOLOCK is actually quite dangerous, mostly due to the ability to get duplicate or missing rows, which, as far as I know, most businesses absolutely don't want, nor would they accept if they knew it was happening. Using a snapshot isolation level is infinitely preferable and your performance on the majority of systems should be as good or better.

    Anyway, the best solution is to use OPENQUERY and do as much of the filtering work on the remote server as possible. Since you can use OPENQUERY as if it were a table, you should be able to use it to load to a temporary table on your local server.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (3/10/2015)


    Anyway, the best solution is to use OPENQUERY and do as much of the filtering work on the remote server as possible. Since you can use OPENQUERY as if it were a table, you should be able to use it to load to a temporary table on your local server.

    Given this scenario, I am interested how you would go about filtering the remote data using OPENQUERY?

    The OP states that the remote table has 500M rows and they have a local set that needs to lookup to the remote data.

    A carefully crafted distributed query using 4PN can quite happily perform seeks on the remote table, but I cannot see how OPENQUERY will allow that, nor what benefit it will bring.

    If you have the time, perhaps you wouldn't mind expanding on the methods you would use to perform this type of lookup across servers?

    JeeTee (3/9/2015)


    Hi All,

    I have two servers (lets call them sA and sB) connected from sA -> sB via a linked server (i.e. sA pulls data across from sB).

    I'm building a temp table full of stock symbols on sA, and then I need to update some values on sA using content on sB. The tables on sB are very large (about 500m rows) so I don't want to pull even close to everything across the linked server. Ordinarily I'd do this by joining to the linked server table, but the target table needs to have nolock on it due to their high use.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Thanks all, I got some good ideas and new information from the various posts here.

    Anyway, What I ended up doing was first run a script to pull the smaller reference data across to the sever with the very large tables, did my work locally, then saved the content down to a table there. Then another script comes through and pulls the pre-processed data across the linked server the other way.

    There's a few more steps involved, but it eliminates the need to worry about nolocks across linked servers altogether.

    To the issue of using nolock entirely, I should probably not have said it's used UNIVERSALLY, but the nature of this data is such that it's read upwards of 2000 times a second in some cases, so for certain operations, it is far more desirable to limit the potential of blocking anything than getting problems associated with nolock. There are also lots of system details that are immaterial to this conversation which I'm omitting. I am fully aware nolock is a sensitive issue among database developers (I've even been on the side warning against using them), so while I appreciate the concern, this is one of the situations where time and time again, it has proved to be the right call for the specifics of certain applications.

    Executive Junior Cowboy Developer, Esq.[/url]

  • JeeTee (3/9/2015)


    Hi All,

    I have two servers (lets call them sA and sB) connected from sA -> sB via a linked server (i.e. sA pulls data across from sB).

    I'm building a temp table full of stock symbols on sA, and then I need to update some values on sA using content on sB. The tables on sB are very large (about 500m rows) so I don't want to pull even close to everything across the linked server. Ordinarily I'd do this by joining to the linked server table, but the target table needs to have nolock on it due to their high use.

    ...

    ...

    Is there some way I can limit the content on sB by my temp table on sA but still use nolock?

    I'd stear clear of joining a local table with a remote table using 4 part naming convention. This type of cross-instance joining is not efficient for a number of reasons, and even with a predicate to filter the result, there is still a potential to pull every record across the network because of the limitations of how remote tables are joined.

    What I'd reccomend for a first attempt is creating a local temp table #s, and then use a pass-through style remote query to select into this #s only the (2) columns you really need for the update. Adding a search predicate to the remote query to pair down the number of rows returned would also help a lot.

    create table #s

    (

    id int not null primary key with (ignore_dup_key = on),

    SomeValue varchar(30) not null

    );

    insert into #s ( id, SomeValue )

    exec('select id, SomeValue from xref.dbo.Symbols') at lnk_sB;

    update t

    set someValue = #s.SomeValue

    from #myTab t

    inner join #s

    on t.id = #s.id;

    You can still add a NOLOCK hint to the Symbols query, but I doubt it's needed. The only reason for it would be if the table is constantly getting inserts, and it's very important not to temporarily block whatever process or application inserts this table. If this is simply a table containing one row for each industry stock symbol, then I don't think either blocking or dirty data is a big concern here. The primary key on #s with (ignore_dup_key = on) will prevent insertion of any duplicates even if there were some returned.

    By default I'd just not use NOLOCK.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I would generally recommend at least a glance at the following page:

    Isolation Levels in the Database Engine

    due to the very simple table, which boils down to:

    Level Data issues possible

    READ UNCOMMITTED Dirty reads, Nonrepeatable reads, Phantom reads

    READ COMMITTED Nonrepeatable reads, Phantom reads

    REPEATABLE READ Phantom reads

    SNAPSHOT (not listed, but tempdb becomes critical, and "If a snapshot transaction attempts to commit an update to a row that was changed after the transaction began, the transaction is rolled back, and an error is raised." - Snapshot Isolation in SQL Server)

    SERIALIZABLE

    It's not just NOLOCK or not :).

  • I'm guessing that Symbols is essentially a reference table containing one row for each company or index fund traded on a stock exchange. If that's the case, then this was probably bulk loaded in the past, gets heavily queried by multiple systems throughout the day, but only occasionally gets write transactions.

    Another option is that you extract a copy of this remote table to your local server and then leverage and SSIS package to periodically perform incremental slowly changing dimension inserts and updates. Perhaps that's actually what you're currently doing but with T-SQL.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I'd stear clear of joining a local table with a remote table using 4 part naming convention. This type of cross-instance joining is not efficient for a number of reasons, and even with a predicate to filter the result, there is still a potential to pull every record across the network because of the limitations of how remote tables are joined.

    From the video Matt Miller posted, that would depend. A small remote table would not cause issues here. In this case since most of the data is on the remote end, the user did a smart thing to make what was the remote server now the local server... especially if the original local table is small.

    ----------------------------------------------------

  • How about options like this?

    update t

    set someValue = s.SomeValue

    from openquery(lnk_sB,'

    select *

    xref.dbo.Symbols s') s

    inner join #myTab t

    on t.id = s.id

    update t

    set someValue = s.SomeValue

    from #myTab t with (nolock)

    where exists ( select id from lnk_sB.xref.dbo.Symbols

    where t.id = s.id )

    The pain of Discipline is far better than the pain of Regret!

  • Hey JeeTee,

    So which of these ponies runs fastest?

    I'm placing my money on the INSERT #S EXEC() AT method I described earlier.

    :satisfied:

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Viewing 9 posts - 16 through 23 (of 23 total)

    You must be logged in to reply to this topic. Login to reply