Query seems to be constrained by single core CPU in multiple core server - any suggestions?

Question

Post reply

Query seems to be constrained by single core CPU in multiple core server - any suggestions?

caspersql

SSCrazy

Points: 2276
More actions
May 8, 2013 at 3:19 am

#275116

I receive data from an external data provider which comes in as a single large table. The data represents balance sheet information for several different companies.
The table structure boils down to 3 columns:
1. PK Company identifier (each different company has a different id number)
2. PK Field id number (each item on the balance sheet has a different id number, so total assets might be '15', total liabilities '23' etc.)
3. The data itself
For my OLAP system, I want to populate a table that contains a single row for each company. This table will be viewed regularly by users. The columns in the table would hold all the different balance sheet items for each company.
The SELECT statement I used for inserting data from the large source table to my destination table joins up on itself using the different field numbers as follows:
SELECT
MySelectList
FROM SourceTable a
LEFT JOIN SourceTable a1
ON
a.FieldIDNumber = '1'
ANDa.CompanyIdentifier = a1.CompanyIdentifier
AND a1.FieldIDNumber = '2'
LEFT JOIN SourceTable a2
ON
a.CompanyIdentifier = a2.CompanyIdentifier
AND a2.FieldIDNumber = '3'
LEFT JOIN SourceTable a3
ON
a.CompanyIdentifier = a3.CompanyIdentifier
AND a3.FieldIDNumber = '4'
LEFT JOIN SourceTable a4
ON
a.CompanyIdentifier = a4.CompanyIdentifier
AND a4.FieldIDNumber = '5'
This select statement takes a very long time to run. I tried to figure out what the bottleneck is and I think I'm CPU constrained for the following reasons:
1. Pattern of system resource use while query is running:
At the very beginning of the query execution, I see heavy reads on the database hard disks as the source table is loaded into memory, which I expected. Then for the vast majority of the time the query is running, hard disk read/write goes to zero and a single core on the server is 100% utilized. There is an occasional brief read of the database hard disk, maybe once every 20 minutes or so for about 2 or 3 minutes. The utilization of the remaining 5 cores is close to 0. Note that I have max degree of parallelism set to zero so all the 6 cores should be available to SQL server.
Previously on the same machine, I had hyper-threading enabled. Back then when I ran the same query, only 1 of the 12 threads was fully utilized. So CPU utilization went up when I disabled hyper-threading, but only until the point where 1 core was 100% utilized.
2. If I'm reading diagnostic checks from Glenn Berry correctly (and my correct reading isn't a given because I'm self-taught in all things programming), I don't think I'm memory constrained:
When I run this code:
-- SQL Server Process Address space info
--(shows whether locked pages is enabled, among other things)
SELECT physical_memory_in_use_kb,locked_page_allocations_kb,
page_fault_count, memory_utilization_percentage,
available_commit_limit_kb, process_physical_memory_low,
process_virtual_memory_low
FROM sys.dm_os_process_memory WITH (NOLOCK) OPTION (RECOMPILE);
The result is 0 for process_physical_memory_low and 0 for process_virtual_memory_low.
When I run this code:
SELECT total_physical_memory_kb, available_physical_memory_kb,
total_page_file_kb, available_page_file_kb,
system_memory_state_desc
FROM sys.dm_os_sys_memory WITH (NOLOCK) OPTION (RECOMPILE);
system_memory_state_desc says 'Available physical memory is high'
When I look at the top wait types for the server, I see that the top wait type for the server is CX_PACKET, which is what the server is waiting on over 99% of the time.
I would be grateful if anyone had suggestions on how to improve the speed of execution, or could answer the following related questions:
1. My primary key on the source table includes both the company Id and the field ID number. Each of the joined tables has a different field ID number. Would the query run faster if I placed a non-clustered index on the field ID number, despite the fact that it is part of the primary key?
2. Would the following result in higher CPU utilization?
-- Use one statement to join tables a1 and a2 to table a into a temporary table.
-- Use a second statement running concurrently to join tables a3 and a4 to table a into a second temporary table. Perhaps this would utilize a different core to the first statement?
-- Use a third statement once the first two have finished to join both temporary tables together and insert into my destination table.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 1

First of all your check for a.FieldIDNumber = '1' should better be moved into WHERE clause:

SELECT

MySelectList

FROM SourceTable a

LEFT JOIN SourceTable a1

ON a.CompanyIdentifier = a1.CompanyIdentifier

AND a1.FieldIDNumber = '2'

LEFT JOIN SourceTable a2

ON

a.CompanyIdentifier = a2.CompanyIdentifier

AND a2.FieldIDNumber = '3'

LEFT JOIN SourceTable a3

ON

a.CompanyIdentifier = a3.CompanyIdentifier

AND a3.FieldIDNumber = '4'

LEFT JOIN SourceTable a4

ON

a.CompanyIdentifier = a4.CompanyIdentifier

AND a4.FieldIDNumber = '5'

WHERE a.FieldIDNumber = '1'

Looks like it is banking client data you are working with. Is your FieldIDNumber defined as varchar or int? If it's a numeric, then don't use quotes to avoid unnecessary data type conversion.

And the last one about your CXPACKET waits. That shows that you have problem with parallelism. Try to add OPTION (MAXDOP 1) after WHERE clause and see if it makes it run faster.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

caspersql SSCrazy Points: 2276 More actions · Answer 2

Thanks for the suggestions Eugene.

First of all your check for a.FieldIDNumber = '1' should better be moved into WHERE clause

Will try, many thanks

Is your FieldIDNumber defined as varchar or int? If it's a numeric, then don't use quotes to avoid unnecessary data type conversion.

It is not defined as numeric but it should be! Looking back, I see that I had originally set it as varchar because the documentation for the data source said it was alphanumeric. However after a few years of operation, there is only numeric data in that column, so I think that was a typo in the documentation. I'll change the column to numeric and remove the quotes to speed things up. It'll take a while to do so it will be a while before I've tested your other suggestions.

And the last one about your CXPACKET waits. That shows that you have problem with parallelism. Try to add OPTION (MAXDOP 1) after WHERE clause and see if it makes it run faster.

I'll try that, many thanks. I have a feeling though that the 1 core used will still be at 100% utilization, soon to find out.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Can you post the actual execution plan as a .sqlplan attachment please? Makes this kind of exercise much easier.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

caspersql SSCrazy Points: 2276 More actions · Answer 4

ChrisM@Work (5/8/2013)
Can you post the actual execution plan as a .sqlplan attachment please? Makes this kind of exercise much easier.

This query was so long-running (over 2 days) that I haven't actually seen it complete yet so I just have the estimated, not actual execution plan as of now. After implementing Eugene's suggestions I'll post up the actual if the query actually ever finishes for me...

In the meantime, here is the estimated plan. The table names are proprietary so I'm posting the execution plan as a jpeg with the names blocked out, does that suffice? All that's blocked out is the table name which I've callled 'SourceTable' above. That is the table that joins on itself several times.

--EDIT 2013 05 09 - removed screendump of plan I had posted here because I posted the full query plan as an attachment here: http://www.sqlservercentral.com/Forums/Attachment13640.aspx

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 5

caspersql (5/8/2013)
ChrisM@Work (5/8/2013)
Can you post the actual execution plan as a .sqlplan attachment please? Makes this kind of exercise much easier.
This query was so long-running (over 2 days) that I haven't actually seen it complete yet so I just have the estimated, not actual execution plan as of now. After implementing Eugene's suggestions I'll post up the actual if the query actually ever finishes for me...
In the meantime, here is the estimated plan. The table names are proprietary so I'm posting the execution plan as a jpeg with the names blocked out, does that suffice? All that's blocked out is the table name which I've callled 'SourceTable' above. That is the table that joins on itself several times.
Note that my example code above was simplified somewhat. There is a coalesce, group by and order by in the actual statement.

Unfortunately, I cannot see anything attached. Anyway, JPG is not very good idea for posting query plan. You better save it as XML. You need to post complete query if you want relevant help. Also, posting DDL of involved database objects would be helpful.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

Grant Fritchey SSC Guru Points: 398679 More actions · Answer 6

Without the plan itself (saved & posted as XML as the previous poster said) it's hard to make serious suggestions to you. But, the one thing I can tell you, when you see a pattern with really fat data pipes on the right and really skinny ones on the left, you have a problem. It's probably missing or incorrect indexes or, possibly, out of date statistics. But without the details of information within the plan, that's just guesses.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

caspersql SSCrazy Points: 2276 More actions · Answer 7

Unfortunately, I cannot see anything attached. Anyway, JPG is not very good idea for posting query plan. You better save it as XML. You need to post complete query if you want relevant help. Also, posting DDL of involved database objects would be helpful.

Okay will do thanks, the database is currently changing the datatype of the Field ID Number to numeric so I'll post it up when that's finished, but it might take a while to complete.

Edit: full query plan uploaded here: http://www.sqlservercentral.com/Forums/Attachment13640.aspx

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

Can you post the query with obfuscated table names?

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

caspersql SSCrazy Points: 2276 More actions · Answer 9

Can you post the query with obfuscated table names?

Yes will do after I've rewritten it slightly to factor in the datatype change for the field ID number. When that is done I will post it with obfuscated table names. It could take a while because 'SourceTable' is very large so the datatype change will take a while.

Sergiy SSC Guru Points: 110206 More actions · Answer 10

First of all you need to make sure the Sourcetable (a1, a2,a3,a4,a5) have clustered index on (CompanyIdentifier,FieldIDNumber)

Second, if I understand correctly those joined tables are actually the same one joined multiple times to itself. Then you better join it once and put all FieldID's into a single IN condition.

Then you may put the returned values into different columns in SELECT part using cross-tab (pivot) kind of query.

Third - the table "a" must be a derived table with grouping by CustomerIdentifier inside, so it returns a single record per Customer.

And the last.

It's just my speculation because I did not the the actual query, but I recon you do not need those joins at all.

One simple cross-tab query (considering you've got that clustered index in place) will do the job easily and quickly.

You may find good examples for such queries in BOL or on this site.

_____________
Code for TallyGenerator

caspersql SSCrazy Points: 2276 More actions · Answer 11

Eugene Elutin (5/8/2013)JPG is not very good idea for posting query plan. You better save it as XML. You need to post complete query if you want relevant help. Also, posting DDL of involved database objects would be helpful.

Apologies for not doing this in my first post, thought it would simplify matters to just ask about a table joining on itself but I should have known better. The full query plan with obfuscated names is attached as a .sqlplan file (xml wasn't a permitted type but the xml is in the file).

Here is the complete query. I've moved the a-related filters from the JOIN clause to the WHERE clause per your suggestion:

SELECT

a.CompanyIdentifier

,COALESCE(a.Data,a1.Data)

,COALESCE(a2.Data,a3.Data)

,a4.Data

FROM SourceTable a

LEFT JOIN SourceTable a1

ON

a.CompanyIdentifier = a1.CompanyIdentifier

AND a1.FieldIDNumber = 2

AND LEFT(a1.CompanyIdentifier, 5) != 'EXCLU'

AND a1.IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

LEFT JOIN SourceTable a2

ON

a.CompanyIdentifier = a2.CompanyIdentifier

AND a2.FieldIDNumber = 3

AND LEFT(a2.CompanyIdentifier, 5) != 'EXCLU'

AND a2.IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

LEFT JOIN SourceTable a3

ON

a.CompanyIdentifier = a3.CompanyIdentifier

AND a3.FieldIDNumber = 4

AND LEFT(a3.CompanyIdentifier, 5) != 'EXCLU'

AND a3.IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

LEFT JOIN SourceTable a4

ON

a.CompanyIdentifier = a4.CompanyIdentifier

AND a4.FieldIDNumber = 5

AND LEFT(a4.CompanyIdentifier, 5) != 'EXCLU'

AND a4.IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

WHERE

a.FieldIDNumber = 1

AND LEFT(a.CompanyIdentifier, 5) != 'EXCLU'

AND a.IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

GROUP BY

a.CompanyIdentifier

,a.Data

,a1.Data

,a2.Data

,a3.Data

,a4.Data

ORDER BY

a.CompanyIdentifier

,a.Data

,a1.Data

,a2.Data

,a3.Data

,a4.Data

ASC

Here is the DDL for the table and its primary key:

CREATE TABLE SourceTable

(

CompanyIdentifier [varchar](9) NOT NULL,

PKColumn1NotPartOfQuery [int] NOT NULL,

PKColumn2NotPartOfQuery [varchar](1) NOT NULL,

PKColumn3NotPartOfQuery [smallint] NOT NULL,

FieldIDNumber [decimal](5, 0) NOT NULL,

IndicatorRecordContainsAlphanumericNoteRatherThanData [varchar](4) NOT NULL,

PKColumn4NotPartOfQuery [smallint] NOT NULL,

Data[varchar](8000) NULL,

PRIMARY KEY CLUSTERED

(

CompanyIdentifier ASC,

PKColumn1NotPartOfQuery ASC,

PKColumn2NotPartOfQuery ASC,

PKColumn3NotPartOfQuery ASC,

FieldIDNumber ASC,

IndicatorRecordContainsAlphanumericNoteRatherThanData ASC,

PKColumn4NotPartOfQuery ASC

)

WITH

(

PAD_INDEX = OFF,

STATISTICS_NORECOMPUTE = OFF,

IGNORE_DUP_KEY = OFF,

ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON

)

ON [PRIMARY]

) ON [PRIMARY]

caspersql SSCrazy Points: 2276 More actions · Answer 12

Sergiy (5/8/2013)
First of all you need to make sure the Sourcetable (a1, a2,a3,a4,a5) have clustered index on (CompanyIdentifier,FieldIDNumber)

Many thanks for your suggestions Sergiy.

Yes there is a clustered index that includes both of those columns, however it also includes a few additional columns that are not used in this particular query but which are used in other queries. Would it speed things up if I added a non-clustered index on just the columns that are used in join criteria and filters of this query, despite the fact that they are all contained in the clustered index/primary key?

Second, if I understand correctly those joined tables are actually the same one joined multiple times to itself. Then you better join it once and put all FieldID's into a single IN condition. Then you may put the returned values into different columns in SELECT part using cross-tab (pivot) kind of query.
And the last.
It's just my speculation because I did not the the actual query, but I recon you do not need those joins at all.
One simple cross-tab query (considering you've got that clustered index in place) will do the job easily and quickly.
You may find good examples for such queries in BOL or on this site.

Yes correct - it is the same table joined to itself multiple times. Interesting suggestion, I will look into this and test it.

Many thanks for your help, much appreciated.

Eugene Elutin SSC Guru Points: 59322 More actions · Answer 13

Your query doesn't look very logical, especially in its select list part as it doesn't correspond to order by.

Would be beneficial if you could post some test data (in a form of INSERT statements).

For now you can try the following query, which does the same job without any joins.

SELECT

a.CompanyIdentifier

,COALESCE(MAX(CASE WHEN a.FieldIDNumber = 1 THEN a.Data ELSE NULL END)

,MAX(CASE WHEN a.FieldIDNumber = 2 THEN a.Data ELSE NULL END))

,COALESCE(MAX(CASE WHEN a.FieldIDNumber = 3 THEN a.Data ELSE NULL END)

,MAX(CASE WHEN a.FieldIDNumber = 4 THEN a.Data ELSE NULL END))

,MAX(CASE WHEN a.FieldIDNumber = 5 THEN a.Data ELSE NULL END) AS Data

FROM SourceTable a

WHERE LEFT(a.CompanyIdentifier, 5) != 'EXCLU'

AND a.IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

GROUP BY a.CompanyIdentifier

ORDER BY a.CompanyIdentifier

If the record with FieldIDNumber = 1 and not "EXCLU" exists for every CompanyIdentifier, the above query should produce the same results (given that only one record per CompanyIdentifier, FieldIDNumber may exist).

Also, you don't use PKColumn1NotPartOfQuery columns. What are they about?

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help[/url]

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 14

Here's a basic crosstab query to get you started. It might just work 😉

SELECT

CompanyIdentifier,

,COALESCE(Field1,Field2)

,COALESCE(Field3,Field4)

,Field5

FROM (

SELECT

CompanyIdentifier,

Field1 = MAX(CASE WHEN FieldIDNumber = 1 THEN Data END),

Field2 = MAX(CASE WHEN FieldIDNumber = 2 THEN Data END),

Field3 = MAX(CASE WHEN FieldIDNumber = 3 THEN Data END),

Field4 = MAX(CASE WHEN FieldIDNumber = 4 THEN Data END),

Field5 = MAX(CASE WHEN FieldIDNumber = 4 THEN Data END)

FROM SourceTable

--WHERE LEFT(CompanyIdentifier, 5) != 'EXCLU' -- NOT SARGABLE

WHERE CompanyIdentifier NOT LIKE 'EXCLU%' -- SARGABLE

AND IndicatorRecordContainsAlphanumericNoteRatherThanData = ''

GROUP BY CompanyIdentifier

) d

ORDER BY CompanyIdentifier

EDIT: Quick today, Eugene 😀

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden