Fastest way to read a 600MB table (in memory)

Question

Fastest way to read a 600MB table (in memory)

noeld

SSC Guru

Points: 96590
More actions
August 27, 2013 at 11:32 am

#276611

A few days ago I came across this interesting problem:
I cannot get more than 15-19mb/s is there any limitation that I need to know about?
The query is a simple select (not doing sorting) and the database is read-only : "select * from tablename"
Tried local and remote and still get the same throughput therefore is not a network problem.
Launching multiple queries in parallel bring the total time down almost linearly as I increase the number of parallel connections (which bring different sections of the data).
Finally the table fits entirely in memory therefore no DiskIO is involved either.
If someone has any insight, please chime in.

* Noel

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Robert Klimes SSCoach Points: 17865 More actions · Answer 1

Might be the clients ability to consume the data. SSMS uses RBAR to retrieve rows. IF using SSMS turn on client statistics from the query menu. this will show how long the client took to process the results.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

noeld SSC Guru Points: 96590 More actions · Answer 2

there is no RBAR here. The server shows ASYNC_NETWORK_IO in both cases (shared memory and network)

We get in our software similar speed to SSMS

* Noel

Robert Klimes SSCoach Points: 17865 More actions · Answer 3

ASYNC_NETWORK_IO could very much mean that the client is the bottleneck.

http://blogs.msdn.com/b/joesack/archive/2009/01/09/troubleshooting-async-network-io-networkio.aspx

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

noeld SSC Guru Points: 96590 More actions · Answer 4

I disagree. SSMS takes similar time using shared memory on the machine.

The fact that I get linear increase in throughput as I parallelize the access indicates that there is a limit somewhere in TDS

and it not network/app related.

BTW it takes us 45s to read 600MB worth of data

* Noel

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 5

noeld (8/27/2013)
I disagree. SSMS takes similar time using shared memory on the machine.
The fact that I get linear increase in throughput as I parallelize the access indicates that there is a limit somewhere in TDS
and it not network/app related.
BTW it takes us 45s to read 600MB worth of data

ASYNC_NETWORK_IO waits indicates that the "pipe" that carries the data to the client is stopped up and SQL Server has to wait to push more data through the "pipe". This can be because of network issues or issues with the client application that receives the data. Since you say experience the same waits when you run the query from SSMS on the server itself, it appears that network throughput is not your problem.

However, SSMS is just another client application when it comes to querying the database and displaying results, and formatting and displaying 600MB of data in SSMS's grid view would take a while in most cases. Try this: in SSMS, with your query window active, click Query > Query Options. In the Query Options dialog, click Results, then check the box next to "Discard results after execution". This will tell SQL Server to run the query but SSMS will just dump the data rather than trying to format and display it. If you run your query once to ensure that all or most of the table is in the buffer cache, then run it again and get execution times consistent with few physical reads (i.e., SQL Server found most or all of the data pages in the buffer cache) and no undue waits, then you should be able to conclude that SSMS (or other calling application) is the bottleneck.

Throwing 600MB of data at an application seems excessive - does your application *really* need to select all columns of all rows from the table?

Jason Wolfkill

x SSC-Insane Points: 23660 More actions · Answer 6

Robert klimes (8/27/2013)
Might be the clients ability to consume the data. SSMS uses RBAR to retrieve rows. IF using SSMS turn on client statistics from the query menu. this will show how long the client took to process the results.

Whats the slow part with RBAR in SSMS? Its not using T-SQL, so without that slowdown I'm wondering why RBAR would be involved here.

Obviously its closed source, so I'm sure without actual developer input we can only make educated guesses, but I'm curious what your theory is here.

noeld SSC Guru Points: 96590 More actions · Answer 7

I suspect that TDS has limitted size buffers and that the protocol itself imposes this speed limit.

600mb by today's standards is really not a lot of memory.

For reference I can transfer - from disk to disk - 3gb files in under 30s on a 10Gb network.

I was hoping someone from MS with inside knowledge of the implementation would chime in

* Noel

Robert Klimes SSCoach Points: 17865 More actions · Answer 8

Whats the slow part with RBAR in SSMS? Its not using T-SQL, so without that slowdown I'm wondering why RBAR would be involved here.
Obviously its closed source, so I'm sure without actual developer input we can only make educated guesses, but I'm curious what your theory is here.

This is just something I have read and heard numerous times.(ex. http://connectsql.blogspot.ca/2013/01/sql-server-query-slow-in-ssms-fast-in.html)

I believe it is doing RBAR locally consuming the data, thus allowing SSMS to start displaying the first rows from the result before the entire result set is consumed.

Bob
-----------------------------------------------------------------------------
How to post to get the best help[/url]

DeWayne_McCallie Hall of Fame Points: 3617 More actions · Answer 9

Be sure to install the latest service pack on your ssms client.

Erland Sommarskog SSC-Insane Points: 24038 More actions · Answer 10

I have been involved in another thread where someone was trying to use BCP to export data, and the througput he got was about the same as yours. I did some tests myself, and I was able to improve performance for BCP by redirecting the output from BCP to nul. (Not the data file, but the progress message.)

SSMS is of course useless for benchmarking, because of the time it takes to render that grid. Possibly if you redirect output to file. But it would be better to test with SQLCMD. Or BCP. SQLCMD has to waste time on cycles on formatting, but has no progress message.

[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

x SSC-Insane Points: 23660 More actions · Answer 11

Robert klimes (8/29/2013)
Whats the slow part with RBAR in SSMS? Its not using T-SQL, so without that slowdown I'm wondering why RBAR would be involved here.
Obviously its closed source, so I'm sure without actual developer input we can only make educated guesses, but I'm curious what your theory is here.
This is just something I have read and heard numerous times.(ex. http://connectsql.blogspot.ca/2013/01/sql-server-query-slow-in-ssms-fast-in.html)
I believe it is doing RBAR locally consuming the data, thus allowing SSMS to start displaying the first rows from the result before the entire result set is consumed.

Its just the first time I've seen RBAR used in a non T-SQL context.

Erland Sommarskog SSC-Insane Points: 24038 More actions · Answer 12

patrickmcginnis59 10839 (8/30/2013)Its just the first time I've seen RBAR used in a non T-SQL context.

For that matter, I had not seen the term RBAR before I started to be active on SQL Server Central. I guess it refers to row-by-row processing with an extra A to make it more fun. Personally, I have never been fond of jargon.

Sometimes row-by-row processing is the right thing in T-SQL, although most often it isn't. On client level it is a different matter. Well, a client can of course do something silly like querying a table for one ID at time, rather than all IDs at once. But that will not result in ASYNCH_NETWORK_IO.

Or more to the point: that blogpost is completely confused. When running a query in .Net, there four different ways to do this: ExecuteNonQuery which discards all rows. ExecuteScalar which returns the first value and then discard the rest. ExecuteReader that permits you to receive the rows one by one. And finally there is DataAdapter.Fill which puts all data in a DataSet.

Of the latter two which is the best? Guess what? IT DEPENDS! DataAdapter.Fill is kind of convenient, but it only really makes sense if you going do something with that DataSet like bind it to a grid. Say that your aim is to read the data from large result set and write it to a file. DataAdapter.Fill is a *big* mistake here. Use ExecuteReader and keep down memory usage. What can cause a problem when you use a reader is if you do a lot of processing for every row, then you can keep locks on the data on the source tables for an extended period of time. But if they are temp tables, that is not really a problem. And it's better to have a 1GB in SQL Server that knows to spill to disk than having a 1GB DataSet in client memory.

[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 13

Heh... you're in IT where there's nothing but jargon. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

noeld SSC Guru Points: 96590 More actions · Answer 14

Well?

Does anybody have the inside on TDS that I am looking for?

* Noel