Best way to select a large amount of data?

Question

Best way to select a large amount of data?

ricardo_chicas

SSCertifiable

Points: 5024
More actions
August 19, 2010 at 1:50 pm

#239178

Hello All,
I have this situation:
For several reports we need to extract about 5 millions records from a table of 124 million ( 173 gb)
I am able to use the clustered index ( with a "like" statement) and the only other parameter is a column with a state so it is not helping much, the thing is that the resultant table is really big ( 5 gb ) and is taking forever to load and it is also consuming lots of resources from the server ( we usually get the error there is not enough system memory to run this query... ) That said, I need to find a way to obtain the same data in a faster way or at least not consuming that much resources, maybe in chunks but I am not sure how to do it...
We have a server with 16 processors and 42 gb of ram, sql server 2006 sp3
Thanks a los

Viewing 15 posts - 1 through 15 (of 33 total)

You must be logged in to reply to this topic. Login to reply

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 1

ricardo_chicas (8/19/2010)
For several reports we need to extract about 5 millions records from a table of 124 million ( 173 gb)
I am able to use the clustered index ( with a "like" statement) and the only other parameter is a column with a state so it is not helping much, the thing is that the resultant table is really big ( 5 gb ) and is taking forever to load and it is also consuming lots of resources from the server ( we usually get the error there is not enough system memory to run this query... ) That said, I need to find a way to obtain the same data in a faster way or at least not consuming that much resources, maybe in chunks but I am not sure how to do it...
We have a server with 16 processors and 42 gb of ram, sql server 2006 sp3

Do you have any indexes to actually serve this query?

Retrieving 5M out of 124M rows is a case where a very selective index would certainly help.

On the other hand, a "like" predicate pretty much forces optimizer to do a full table scan - have you traced the query to see what is doing?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 2

Hi,

I forgot to say that the "like" is this way: column like '123456%'

so it is using the index and doing a clustered index seek

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 3

ricardo_chicas (8/19/2010)
I forgot to say that the "like" is this way: column like '123456%'
so it is using the index and doing a clustered index seek

Are you positive this is what query is doing? did you check execution plan?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 4

ricardo_chicas

SSCertifiable

Points: 5024

August 19, 2010 at 2:42 pm

#1209373

Yes, I am absolutely sure

Steve Jones - SSC Editor SSC Guru Points: 734416 More actions · Answer 5

Are you perhaps selecting columns you don't need? Or pulling back BLOB type data?

5mm rows doesn't seem like a crazy number. Are you sure the server is properly sized?

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 6

I am only getting 5 columns, all of then are small, what i am looking for is a way to select chunks of that data in a optimal way....

Stefan_G SSCertifiable Points: 6609 More actions · Answer 7

ricardo_chicas (8/19/2010)
Hello All,
I have this situation:
For several reports we need to extract about 5 millions records from a table of 124 million ( 173 gb)
I am able to use the clustered index ( with a "like" statement) and the only other parameter is a column with a state so it is not helping much, the thing is that the resultant table is really big ( 5 gb ) and is taking forever to load and it is also consuming lots of resources from the server ( we usually get the error there is not enough system memory to run this query... )

Exactly how are you generating the report?

SSRS ? Custom program ? T-SQL ?

Is the report generator running on the same machine as the SQL server ?

Do you really need to return all 5 million rows to the report generator?

I suppose the report generator does not print out a report with 5 million rows but instead does some aggregations of the data to print out a more reasonable number of rows. If this is the case could you perhaps do this aggregation in the source query instead of doing it in the report ?

Another possible thing to look for is if you have an ORDER BY in the source query - that might force SQL server to sort the result set which is fairly expensive for 5GB data

/SG

Steve Jones - SSC Editor SSC Guru Points: 734416 More actions · Answer 8

5mm rows for a report is a little silly. There is no way anyone goes through this much data, even for an audit. And I can't see it being run very often.

I would look to clean this up from a business point of view.

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 9

Hello,

It is a T-SQL program, the problem is that we really need that much info, I can't change the logic, And that info changes really often so the report must be updated every day, again, there is a way to retrieve that much info in chunk, but in an optimal way?

Stefan_G SSCertifiable Points: 6609 More actions · Answer 10

Can you explain a little more in detail exactly what you are doing ?

You say the report is a T-SQL program.

Exactly what does that mean ?

Do you have a stored procedure that writes 5 million rows to a temporary table ?

Do you run a stored procedure from SSMS, return 5 million rows and then try to write them to a file from SSMS ?

Or what are you really doing ?

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 11

Ok, the T-sql program loads all the needed data into a table, then another process use that data for a webpage ( believeme all of that data is needed) that said

Again

What is the best way to retrieve that much data without uning a single query that block my main table for more that an hour ( I can't use nolock since I can't affort to have dirty reads) and still remain optimal?

Steve Jones - SSC Editor SSC Guru Points: 734416 More actions · Answer 12

There isn't a good way here. You need to move that much data, and it will block things.

I still question this is really needed. No one can use 5mm rows at once, and if you are summarizing, why aren't you using SQL to do some summaries? Display smaller chunks of data, or pull back smaller chunks at a time.

Stefan_G SSCertifiable Points: 6609 More actions · Answer 13

ricardo_chicas (8/20/2010)
Ok, the T-sql program loads all the needed data into a table, then another process use that data for a webpage ( believeme all of that data is needed) that said
Again
What is the best way to retrieve that much data without uning a single query that block my main table for more that an hour ( I can't use nolock since I can't affort to have dirty reads) and still remain optimal?

I'm trying to help you here, but you are really providing too little information. Please be more detailed.

My understanding so far is that you have one large table with 124 million rows

you then use a T-SQL script to move 5 million of those rows to a secondary table.

An external program then reads that secondary table and sends it over a network link to an end user.

You then say that something blocks the main table for over an hour. Which table is being blocked ?

Is the problem in the T-SQL script that moves data between the two tables, or is the problem in the external program that tries to send data over a slow network ?

Right now it sounds like the external program is the problem.

If that is the case a suggestion could be to let the external program read all the data from the database as quickly as possible end just save the data to a temporary file on disk.

Then the the external program can read the data from the temporary file and send it to the client.

Have I understood your problem ?

ricardo_chicas SSCertifiable Points: 5024 More actions · Answer 14

Ok lets do this, forget about the original problem

what I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that

Thank you