Importing 8m from linked server

  • Yes, normally, I would import large recordsets with SSIS.

    For now, lets just assume that I have good reason in this case, for using dynamic sql & linked servers.

    Most of my procedures work perfectly well using something along the lines if below:

    Create table #temp(....)

    set @sql = 'insert into #temp(blah) ' +

    'select blah from Openquery(' +

    @linkedServer + ',''' +

    'select ..... From ..... Where ....' +

    ''')'

    exec(@sql)

    The only time I hit a problem, is with 1 table, which is 8 millon rows; (which I assume is the problem)

    Executing this one for top(4000000) runs in around 4 minutes.

    Running for the entire 8m just sees my screen whirring away for 20+ minutes when I kill it.

    I could really do with some insight/advice as to what causes this to happen ? I'm thinking some limitation of processing power and/or disk space?

    But even then, is it on my 'local' server or the remote ?

    Obviously id be happy to hear of alternative approaches, but....

    > I need to reference the dataset in at least 2 separate subsequent queries.

    > I was hoping to utilize indexing on the #temp.

    > Ssis is a ball-ache in this case for several reasons.

    Last related q.

    I read that an operation like above example might lock up the entire tempdb while it is running. Is this true, and if so, anyway to avoid that ?

    Many thanks

  • A part of the problem is that you're querying 8M rows on the remote server. This is probably going to take a while. We don't know how wide the rows are that you're reading, but I'd imagine that they're wider than a 4-byte integer. Take your row size and multiply it by 8M to get the number of bytes you're reading. Divide that number by 8096 (8K - 96 bytes of overhead) and that's the number of pages you're reading from disk on the remote server.

    The next part is that you're then transferring all those bytes over the wire to your local server, which is going to take time.

    Lastly, you're inserting this data into a temp table on your server. Again, looking at the size of the data set you're reading, you're creating a table of that size in your tempdb. Depending on the size of the tempdb when you run it and how many disks it spans, tempdb may need to grow, which takes time. The number of pages you read from the remote server is the number of pages you need to write to tempdb on your server, and, you guessed it - it takes time.

    I honestly don't know if this will technically "lock up" tempdb, but I do know that's an awful lot of IO on your local server. It will at least be resource-intensive. It's going to depend on the size of tempdb, the write speed of your disks, how many spindles are involved and a host of other factors. You should only have a lock on the table you're creating, which should only be visible to your session anyway. As for the rest of tempdb, I'll let someone who knows more about it than I do speak to that.

  • Thanks Ed, that's put some lovely 'exact' detail on the inner workings ... something I unfortunately only have a fuzzy knowledge of.

    You have inspired me to go do much more reading !

    Now I am curious about one of the most obvious alternative, SSIS, and how it compares on a technical level.

    i.e. how does it physically do a more efficient job?

    I looked up a little about the architecture, and found references to the "Data Flow Engine" & it's buffers.

    So I am assuming that is the key, that it has a large dedicated memory available for handling the data ?

    Or does the Engine have a more efficient logic for handling data ?

    Let's say I had a large tempdb on my SQL Server, many times the size of my table to be imported.

    Would SSIS still be a more efficient approach ?

    Sorry to add on to the original question ... but got my teeth into this now !

  • If using SSIS, the only real difference is that you'd be effectively using a BULK INSERT to write the rows to your local table. The reads on the remote server would be the same and the transfer over the wire would be the same. Using either a procedure or SSIS, you have the option of using BULK INSERT, which is very fast. The growth requirements of the local target database would also be the same.

    Under the hood, both approaches still have to do the same work. They have to read rows, transfer data and write rows. I'd rather do the work in an environment where I have the control that T-SQL provides.

  • Thanks for the extra clarification Ed, appreciated !

    🙂

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply