Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Group by performance FK / PK Expand / Collapse
Author
Message
Posted Thursday, July 10, 2014 9:03 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 1, 2014 5:19 AM
Points: 7, Visits: 20
Hello all,

I have something I can't explain, hope some of you can?

We have a medium sized database with the next tables:

- PA: 525000 records
- PR: 780000 records
- R: 1000 records
- B: 45 records

PA: PK = PAARDCODE

PR: PK = PAARDREGISTERCODE
PR: FK = PAARDCODE
PR: FK = REGISTERCODE

R: PK = REGISTERCODE
R: FK = BOEKCODE

B: PK = BOEKCODE

When I group by B.BOEKCODE the query lasts: 10 (or more when 'where' option is added) seconds
When I group by R.BOEKCODE the query lasts less than 2 seconds.

SELECT B.BOEKOMSCHRIJVING, B.BOEKCODE  -- or R.BOEKCODE
FROM PA
INNER JOIN PR ON PA.PAARDCODE = PR.PAARDCODE
INNER JOIN R ON R.REGISTERCODE = PR.REGISTERCODE
INNER JOIN B ON R.BOEKCODE = B.BOEKCODE
GROUP BY BOEKOMSCHRIJVING, B.BOEKCODE -- or R.BOEKCODE
ORDER BY BOEKOMSCHRIJVING

Why is the option of B.BOEKCODE a lot slower compared to R.BOEKCODE?

Thanks in advance,

Peter
Post #1591207
Posted Thursday, July 10, 2014 9:18 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:05 PM
Points: 1,688, Visits: 2,272
Have you looked at the query's execution plan? This would likely reveal the reason. Also, is there more than one record in the table with alias B for a given BOEKCODE value ? In that case you may be traversing a lot of records that aren't needed, as including the B table in the result set requires retreiving all the records from that table for that BOEKCODE value. As you are using GROUP BY, you don't need the B table data if you aren't aggregating anything from it, so using the R table's version is fine.


Steve
(aka sgmunson)

Internet ATM Machine
Post #1591219
Posted Thursday, July 10, 2014 9:22 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 1, 2014 5:19 AM
Points: 7, Visits: 20
sgmunson (7/10/2014)
Have you looked at the query's execution plan? This would likely reveal the reason. Also, is there more than one record in the table with alias B for a given BOEKCODE value ? In that case you may be traversing a lot of records that aren't needed, as including the B table in the result set requires retreiving all the records from that table for that BOEKCODE value. As you are using GROUP BY, you don't need the B table data if you aren't aggregating anything from it, so using the R table's version is fine.


Hi Sgmunson,

Thanks for looking into this, I will look at the execution plan to see what is happening.

From the 'B' table, I need the 'BOEKOMSCHRIJVING', that's why I've added the 'B' table.

For the extra records, Every PA has (at least) one PR, each PR contains 1 R, and each R has one B
Post #1591220
Posted Thursday, July 10, 2014 9:34 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:05 PM
Points: 1,688, Visits: 2,272
Given your additional info, this suggests that once you have the query plan, take a close look at the indexes on the B table. See if there are any that have both the BOEKCODE as well as the other selected field. I'm guessing not, and by selecting that field from the B table, you have to traverse all the records rather than hit an index that just contains the other selected field, but I wouldn't want to offer any guarantee on that. The query plan will likely make things clear as to what's happening when you select it from the B table instead of the R table. You may be able to anticipate the plan's answer just by running the following query:

SELECT BOEKCODE, BOEKOMSCHRIJVING, COUNT(*)
FROM B
GROUP BY BOEKCODE, BOEKOMSCHRIJVING
HAVING COUNT(*) > 1

If that query returns any rows, you might want to substitute the following into your query in place of the B table:

SELECT DISTINCT BOEKCODE, BOEKOMSCHRIJVING
FROM B

peter 67432 (7/10/2014)
sgmunson (7/10/2014)
Have you looked at the query's execution plan? This would likely reveal the reason. Also, is there more than one record in the table with alias B for a given BOEKCODE value ? In that case you may be traversing a lot of records that aren't needed, as including the B table in the result set requires retreiving all the records from that table for that BOEKCODE value. As you are using GROUP BY, you don't need the B table data if you aren't aggregating anything from it, so using the R table's version is fine.


Hi Sgmunson,

Thanks for looking into this, I will look at the execution plan to see what is happening.

From the 'B' table, I need the 'BOEKOMSCHRIJVING', that's why I've added the 'B' table.

For the extra records, Every PA has (at least) one PR, each PR contains 1 R, and each R has one B


Steve
(aka sgmunson)

Internet ATM Machine
Post #1591222
Posted Friday, July 11, 2014 4:56 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 1, 2014 5:19 AM
Points: 7, Visits: 20
Hi Steve,

I checked the query execution plan, however it is the first time I am looking in such thing, and I have no idea how to see what is wrong. I tried to follow this website: link, however I still don't understand it, can you help me with reading this plan?

Thanks in advance!

Peter

The slow one:


The fast one, also divided in 2 screens for readability.




Post #1591552
Posted Friday, July 11, 2014 5:50 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 10:05 PM
Points: 1,688, Visits: 2,272
One thing I noticed right away was that the "slow" one has two "Nested Loop" icons, while the "fast" one does not, and includes parallelism. Taking advantage of parallel operations usually speeds things up. Having no nested loops is also an advantage. I may not be an expert at reading these plans, but the conclusion I can draw from seeing the nested loop plan going a lot slower than one that takes good advantage of the indexes and operates in parallel is that my original thinking was quite likely correct. Having to retrieve the BOEKCODE from the B table is costing a lot, as it probably comes from the index in the other table, so not having to traverse the individual records from the B table gets you out of a costly pair of nested loops. Next time you want to post a SQL execution plan, be sure to save the plan with the .sqlplan extension, then zip the file and attach it to your post. It is much easier for others to look at the plan that way, and because I can't see the properties of the individual icons, I can't be certain of my conclusion, but I don't think I'm likely to learn differently by examining those properties in this specific case. Someone with more experience reading them might be in a better position to do so, and to better explain it than I can.


Steve
(aka sgmunson)

Internet ATM Machine
Post #1591578
Posted Friday, July 11, 2014 11:41 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: 2 days ago @ 3:26 PM
Points: 2,330, Visits: 3,509
If possible, please attach the actual query plan xml as an xml file, rather than just a picture of the plan. There are row counts and other stats available in the query plan that are critical to analyzing it but that can't be seen from a static picture alone.

SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1591751
Posted Tuesday, July 15, 2014 1:29 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 1, 2014 5:19 AM
Points: 7, Visits: 20
Hi,

Sorry for the late response, was a weekend out of town. The zip file should be attached.

Peter


  Post Attachments 
query.zip (1 view, 14.08 KB)
Post #1592433
Posted Tuesday, July 15, 2014 10:42 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: 2 days ago @ 3:26 PM
Points: 2,330, Visits: 3,509
Interesting, but not definitive.

Would you please run these commands on that database and post the results? That will show what indexes SQL "thinks" are missing, and how existing indexes are being used:


USE [DH_KWPN]

SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

DECLARE @list_missing_indexes bit
DECLARE @table_name_pattern sysname

--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.
SET @list_missing_indexes = 1 --1=list missing index(es); 0=don't.

PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

IF @list_missing_indexes = 1
BEGIN
SELECT
GETDATE() AS capture_date,
DB_NAME(mid.database_id) AS Db_Name,
OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,
mid.equality_columns, mid.inequality_columns, mid.included_columns,
ca1.max_days_active,
migs.*,
mid.statement, mid.object_id, mid.index_handle
FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
) AS ca1
LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON
mig.index_handle = mid.index_handle
LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON
migs.group_handle = mig.index_group_handle
--order by
--DB_NAME, Table_Name, equality_columns
WHERE
1 = 1
AND mid.database_id = DB_ID()
AND OBJECT_NAME(mid.object_id) IN (
'SHA_PAARDACT',
'SHA_PAARDREGISTER',
'SHA_REGISTER',
'SHA_BOEK'
)
ORDER BY
--avg_total_user_cost * (user_seeks + user_scans) DESC,
Db_Name, Table_Name, equality_columns, inequality_columns
END --IF

PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)
-- list index usage stats (seeks, scans, etc.)
SELECT
ius2.row_num, DB_NAME() AS db_name,
i.name AS index_name,
OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,
i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,
dps.row_count,
SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,
ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,
ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,
fk.Reference_Count AS fk_ref_count,
FILEGROUP_NAME(i.data_space_id) AS filegroup_name,
ca1.max_days_active,
ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,
ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update
FROM sys.indexes i WITH (NOLOCK)
INNER JOIN sys.objects o WITH (NOLOCK) ON
o.object_id = i.object_id
CROSS APPLY (
SELECT DATEDIFF(DAY, create_date, GETDATE()) AS max_days_active FROM sys.databases WHERE name = 'tempdb'
) AS ca1
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal > 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
ic.key_ordinal
FOR XML PATH('')
) AS key_cols (key_cols)
OUTER APPLY (
SELECT
', ' + COL_NAME(object_id, ic.column_id)
FROM sys.index_columns ic
WHERE
ic.key_ordinal = 0 AND
ic.object_id = i.object_id AND
ic.index_id = i.index_id
ORDER BY
COL_NAME(object_id, ic.column_id)
FOR XML PATH('')
) AS nonkey_cols (nonkey_cols)
LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON
dps.object_id = i.object_id AND
dps.index_id = i.index_id
LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON
ius.database_id = DB_ID() AND
ius.object_id = i.object_id AND
ius.index_id = i.index_id
LEFT OUTER JOIN (
SELECT
database_id, object_id, MAX(user_scans) AS user_scans,
ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans
FROM sys.dm_db_index_usage_stats WITH (NOLOCK)
WHERE
database_id = DB_ID()
--AND index_id > 0
GROUP BY
database_id, object_id
) AS ius2 ON
ius2.database_id = DB_ID() AND
ius2.object_id = i.object_id
LEFT OUTER JOIN (
SELECT
referenced_object_id, COUNT(*) AS Reference_Count
FROM sys.foreign_keys WITH (NOLOCK)
WHERE
is_disabled = 0
GROUP BY
referenced_object_id
) AS fk ON
fk.referenced_object_id = i.object_id
WHERE
i.object_id > 100 AND
i.is_hypothetical = 0 AND
i.type IN (0, 1, 2) AND
o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND
(
o.name IN (
'SHA_PAARDACT',
'SHA_PAARDREGISTER',
'SHA_REGISTER',
'SHA_BOEK'
)
)
ORDER BY
--row_count DESC,
--ius.user_scans DESC,
--ius2.row_num, --user_scans+user_seeks
-- list clustered index first, if any, then other index(es)
db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name

PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)


Edit: Corrected CROSS APPLY column alias.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1592680
Posted Wednesday, July 16, 2014 8:20 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 1, 2014 5:19 AM
Points: 7, Visits: 20
Hi Scott,

In attachment the results (hopefully good format)

Peter


  Post Attachments 
query Scott.zip (2 views, 2.21 KB)
Post #1593088
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse