Fastest way to read a 600MB table (in memory)

  • The problem is that we don't know how consume the data. TDS is not necessarily the limiting factor - TDS just passes the bytes. To get the data on the pipe, SQL Server has to format the data according to the protocol, but presumably, this is a highly optimised process. On the other hand, the data should also be consumed in the client API and this can be done in very different ways.

    In OLE DB application written in C++ you could more or less use the API buffers directly (nevermind that the code would be ugly). In a .Net application, the data has to be transformed to .Net format in .Net structures etc. That costs cycles.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • The last post for the exact same question asked on technet forums has an example of reading from 600MB with SSMS taking 45sec but reading from same table with .Net code taking 5 sec (I am not sure of validity of the code).

    I have also skimmed the MS-TDS protocol doc and did not see anything about a throughput limit.

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • Robert klimes (9/3/2013)


    The last post for the exact same question asked on technet forums has an example of reading from 600MB with SSMS taking 45sec but reading from same table with .Net code taking 5 sec (I am not sure of validity of the code).

    David Baxter Browne's posts are rarely off the mark by much.

    I have also skimmed the MS-TDS protocol doc and did not see anything about a throughput limit.

    Of course, TDS is only a protocol on how to transfer data. It says nothing about speed.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • noeld (9/3/2013)


    Well?

    Does anybody have the inside on TDS that I am looking for?

    I appologize for not being able to help here. I've not had the need to transfer so much data to a client computer before. If you would and just for my understanding, please, why do you need to transfer this much data between SQL Server and a front end?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/4/2013)


    noeld (9/3/2013)


    Well?

    Does anybody have the inside on TDS that I am looking for?

    I appologize for not being able to help here. I've not had the need to transfer so much data to a client computer before. If you would and just for my understanding, please, why do you need to transfer this much data between SQL Server and a front end?

    Jeff,

    First of all thanks for the reply. This is really not a lot of data (the table is a lot larger, well into the billions of rows ) it just so happen that because most columns are floats it adds up.

    I understand that this is uncommon for most applications and I cannot give too much insight on what we are doing but it should be enough to say this data will be cached in a client and we were looking to improve cache miss refresh. I was surprised that I could not go faster than 19MB/s from memory-to-memory in a single connection, Regardless if I was local or remote.

    Most articles focus on "loading" data into sql server, so far I have not seen any on high speed "export" of the data out.

    I can linearly lower the time by going in parallel and that got me curious.

    Although transferring files is not the same I can go orders of magnitude faster and it is discouraging to know that this is not a hardware limitation and that there is very little info about this.

    I was looking forward any insight from Microsoft people who could shed some light or if any MVP( Erm ... 🙂 ) could dig into it for us.

    Cheers


    * Noel

  • Again, one reason you get so little help is that we know so little about what you do. It still not clear to me how you measure the speed.

    You mention something about a cache refresh. That reminds me of a case in our system. When monitoring, I found a statement that had been running for 40 minutes, and it was a plain SELECT from a table variable/temp table. This call was from some home-brewn cache mechanism, and it to me it was perfectly clear what was going on: there was a lot of processing for every row that it received.

    I don't find it puzzling that you get higher speed with parallell threads - you are using more CPUs on the client.

    I maintain my position that a very decisive factor here is the processing speed in the client. If you change your client to just read through the rows, but simply discard the data, what happens?

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply