Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Decrease query execution time to retrieve records from huge table Expand / Collapse
Author
Message
Posted Friday, August 15, 2014 9:06 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 12:49 PM
Points: 39, Visits: 286
Hi there,

I have a table that contains Financial information in text format for each company along with each Company ID. This table gets updated every two weeks. It has around 25 million records and changes with updates. I take this table and trying to create Lucene index using C# program. Instead of pulling all records at once, I thought I will do this

1.
select Count(*)/4 into firstpart from dbo.CompanyFinance

2.
with Records AS(select row_number() over(order by d) as 'row', * 
from dbo.CompanyFinance)
select id,FinancialInfo from records
where row<firstpart

To get the next 1/4th of the records? The way I thought of doing was
3. select Count(*)/2 into temp.
4. Get the id for the 12.5 millionth record (just like step 2) and
5. then Select id,FinancialInfo from dbo.CompanyFinance where id between firstpart+1 and id from step 4

Here's my problem is the query execution time. Step 1,2,3,4 are taking a very long time to run. Is there a better query you can suggest.

Create table dbo.CompanyFinance(id int not null, Financialinfo nvarchar(max), CONSTRAINT [pk_id] PRIMARY KEY CLUSTERED 
([Id] ASC)
Insert into dbo.CompanyFinance values(1,'This is a test');
Insert into dbo.CompanyFinance values(2,'I have very large financial info in this field');

Thanks
Rash
Post #1603737
Posted Friday, August 15, 2014 9:20 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:36 AM
Points: 13,992, Visits: 28,374
You're attempting to query 8.25 million records out of 25 million. It's going to scan the table to do this. Indexes won't help. You can get faster or more disks to help speed this up, or you can attempt to move smaller amounts of data so that an index will be of assistance.

----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #1603742
Posted Friday, August 15, 2014 10:28 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, November 18, 2014 12:49 PM
Points: 39, Visits: 286
When you say "you can attempt to move smaller amounts of data so that an index will be of assistance. " are you saying I should get select top 1000 rows at a time instead of several million.
Post #1603776
Posted Friday, August 15, 2014 10:44 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:36 AM
Points: 13,992, Visits: 28,374
The issue is, there's no way within code or structure to speed up moving 1/4 of a 25 million row table. You are completely dependent on hardware because that will lead to a scan of the table. So, in order for you to arrive at benefits to access speed presented by indexes, you have to deal with substantially smaller amounts of data. But, there's a balancing act. Yes, 1,000 rows will result in a seek most likely. But, multiply 1,000 row seeks often enough to move 8+ million rows, suddenly that may no longer be desirable. So you have to balance the benefits of being able to use the index on the table to help you with the amount of data you need to move.

I'd suggest questioning the need to move all the table. Is there a way to incrementally update this thing you're doing rather than a complete replace each time? Incremental data moves can then take advantage of indexes to assist your queries.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #1603784
Posted Friday, August 15, 2014 12:23 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 3:44 PM
Points: 2,266, Visits: 3,419
What is(are) the clustering column(s) on the table? You need to read each 1/4 of the table at a time using the appropriate clustering key range.

What are there nonclustering indexe(es)? (in case they are needed to determine the clustering key ranges).


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1603826
Posted Saturday, August 16, 2014 4:19 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 7:36 AM
Points: 13,992, Visits: 28,374
ScottPletcher (8/15/2014)
What is(are) the clustering column(s) on the table? You need to read each 1/4 of the table at a time using the appropriate clustering key range.

What are there nonclustering indexe(es)? (in case they are needed to determine the clustering key ranges).


Based on the structure posted above, it's an ID/Value table and the cluster is on the ID. With that kind of structure and the data volumes we're talking about, I'm still back to questioning the need for moving the entire table around, let alone tuning that.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of: SQL Server Query Performance Tuning
SQL Server 2012 Query Performance Tuning
SQL Server 2008 Query Performance Tuning Distilled
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #1604025
Posted Saturday, August 16, 2014 9:27 PM This worked for the OP Answer marked as solution
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 3:44 PM
Points: 2,266, Visits: 3,419
Grant Fritchey (8/16/2014)
ScottPletcher (8/15/2014)
What is(are) the clustering column(s) on the table? You need to read each 1/4 of the table at a time using the appropriate clustering key range.

What are there nonclustering indexe(es)? (in case they are needed to determine the clustering key ranges).


Based on the structure posted above, it's an ID/Value table and the cluster is on the ID. With that kind of structure and the data volumes we're talking about, I'm still back to questioning the need for moving the entire table around, let alone tuning that.


Sorry, didn't see that.

Assuming you can accept "close enough" to 1/4, rather than needing an exact 1/4, you could get the min and max, take the diff, divide by 4, and read each 1/4 range of ids. Be sure to explicitly specify id BETWEEN @id_start_of_fourth AND @id_end_of_fourth.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1604142
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse