June 25, 2020 at 1:12 pm
Hi all,
Have been tuning a query a bit and noticing some odd behavior. I've got it to the best runtime as of now; however, it's with a RID lookup. If I add an index to remove the RID (converts to a good index seek) it performs worse as it adds a scan to a clustered columnstore index on a different table.
Any insight to why this is happening?
Attached are the execution plans with the RID and with the CCI.
Here's the index that removes the RID:
CREATE NONCLUSTERED INDEX [IX_Unicorns_Id_WITH_Includes] ON [dbo].[Unicorns]
(
[Id] ASC
)
INCLUDE([UnicornHash],[Name],[SoldToCode],[CustomerAddress],[CustomerCity],[CustomerState],[CustomerZip],[SourceName],[DistributorName],[Created],[LastModified],[IsDeleted],[IsIncomplete],[LastModifiedById],[AccountExclusions],[AccountAssociationProperties],[SkuCode],[TimeDimension],[Brand],[Agreement],[BusinessSegment],[MatchedAccountIds],[MatchedAccountTypes],[VerifiedMatches],[FriendlyId],[RulesAppliedToAutomatch]) WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]
Here's the query coming from Entity Framework:
SELECT
[x].[Id]
,[x].[Created]
,[x].[DistributorSkuSourceId]
,[x].[IsDeleted]
,[x].[LastModified]
,[x].[LastModifiedById]
,[x].[Qty]
,[x].[TransactionDate], [x].[UnicornHash], [x.DistributorSkuSource].[Id], [x.DistributorSkuSource].[Created], [x.DistributorSkuSource].[IsDeleted], [x.DistributorSkuSource].[LastModified], [x.DistributorSkuSource].[LastModifiedById], [x.DistributorSkuSource].[Sku], [x.DistributorSkuSource].[UnicornHash], [x.DistributorSkuSource].[UnicornId], [x.DistributorSkuSource.Unicorn].[Id], [x.DistributorSkuSource.Unicorn].[AccountAssociationProperties], [x.DistributorSkuSource.Unicorn].[AccountExclusions], [x.DistributorSkuSource.Unicorn].[Agreement], [x.DistributorSkuSource.Unicorn].[Brand], [x.DistributorSkuSource.Unicorn].[BusinessSegment], [x.DistributorSkuSource.Unicorn].[Created], [x.DistributorSkuSource.Unicorn].[CustomerAddress], [x.DistributorSkuSource.Unicorn].[CustomerCity], [x.DistributorSkuSource.Unicorn].[CustomerState], [x.DistributorSkuSource.Unicorn].[CustomerZip], [x.DistributorSkuSource.Unicorn].[DistributorName], [x.DistributorSkuSource.Unicorn].[FriendlyId], [x.DistributorSkuSource.Unicorn].[IsDeleted], [x.DistributorSkuSource.Unicorn].[IsIncomplete], [x.DistributorSkuSource.Unicorn].[LastModified], [x.DistributorSkuSource.Unicorn].[LastModifiedById], [x.DistributorSkuSource.Unicorn].[MatchedAccountIds], [x.DistributorSkuSource.Unicorn].[MatchedAccountTypes], [x.DistributorSkuSource.Unicorn].[Name], [x.DistributorSkuSource.Unicorn].[RulesAppliedToAutomatch], [x.DistributorSkuSource.Unicorn].[SkuCode], [x.DistributorSkuSource.Unicorn].[SoldToCode], [x.DistributorSkuSource.Unicorn].[SourceName], [x.DistributorSkuSource.Unicorn].[TimeDimension], [x.DistributorSkuSource.Unicorn].[UnicornHash], [x.DistributorSkuSource.Unicorn].[VerifiedMatches]
FROM [DistributorSkuSourceTransactions] AS [x]
INNER JOIN [DistributorSkuSources] AS [x.DistributorSkuSource]
ON [x].[DistributorSkuSourceId] = [x.DistributorSkuSource].[Id]
INNER JOIN [Unicorns] AS [x.DistributorSkuSource.Unicorn]
ON [x.DistributorSkuSource].[UnicornId] = [x.DistributorSkuSource.Unicorn].[Id]
WHERE EXISTS (
SELECT 1
FROM [AffiliateMatches] AS [am]
WHERE (([am].[AccountId] = '03412069-F3F3-48CB-9E0F-0796BE64A252')
AND ([am].[IsDeleted] <> 1))
AND ([x.DistributorSkuSource.Unicorn].[Id] = [am].[UnicornId]))
AND ([x].[IsDeleted] <> 1)
ORDER BY [x.DistributorSkuSource].[Id]
June 25, 2020 at 3:16 pm
First thing I notice is that your statistics look to be out of date with the cci plan.
Second, could you provide an actual execution plan for the rid method? that one looks to be an estimated plan.
Without seeing that CCI index code, it is hard to say what is wrong with it, but my guess is that it is on more columns than just ID (the only column used in your query). On top of this, it is not ordered in a way that can be used by the query (again, thinking the ID column is not the ordering column) so you get the scan. Your best bet is to either adjust the columnstore index OR make a new nonclustered index on the table.
And the last thing I notice is your statistics appear to be stale. On the Unicorn table (where it is doing a scan) it estimated 7,651 rows and got back 7,650,890 rows. This is going to result in a lot of issues with the optimizer. I'd try updating your statistics as a first bet (likely best bang for your buck) and maybe make an index on the Unicorn table, ID column. I say maybe as more indexes usually means slower INSERT, UPDATE, and DELETE operations and sometimes SELECT too.
If updating statistics helps (which I am fairly confident it will), you might want to look into regular statistics maintenance.
The above is all just my opinion on what you should do. 
As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
June 25, 2020 at 5:59 pm
Thanks Brian!
Odd, thought I had the actual, attached here.
Yeah, the CCI has more than it needs; however, why would simply adding this index to Unicorns to remove the RID lookup, cause a new scan to the CCI in a different table?
As per your recommendation, check out the original post. That's the index I was using to get rid of the RID lookup, that sent me down this path.
I actually updated the stats this morning before I made this post. I just updated them again prior to getting this new execution plan.
Thanks
June 25, 2020 at 6:33 pm
Ohhhh, I see what you're saying about the estimated row size on the CCI ... yeah, we don't manage those. I don't have a ton of experience with columnstore, how does one manage them from a maintenance perspective? ... I type this and I haven't tested yet, trying now!
June 25, 2020 at 8:18 pm
From looking at it, it looks like the columnstore index is giving a bad estimate and that is causing it to use the clustered columnstore index as the estimates make it look like it should be faster when it is actually a lot slower.
Now, I don't use any columnstore indexes at my workplace, but it may not hurt to rebuild that index. My understanding is that in older SQL Server versions, you needed to drop and recreate the index with each insert,updat,e or delete operation, but with newer versions, the SQL Engine can handle this. As you are on Azure, I expect it should be automatically managing this, but since the estimates are so far off, I would think that it is not updating it automatically properly.
And sorry about the bad suggestion; I had meant to remove that part of my reply.
From looking at the execution plans, it looks like it is using the clustered columnstore index because it thinks it is small and it should be faster to get the data than to look at the index. The CCI is in memory, so reading from it should be faster than reading from an index. The query optimizer estimates it will pull 7000-ish rows from an in-memory object OR 7000-ish rows from an on-disk object and decides that in memory will be faster... When it actually is pulling 7 million rows from in-memory.
Now, the RID lookup in the original version is because it is using the previous existing index to find the ID for the join and then needs to look up in the table (heap) to find the remaining values.
Now, why it picks to use the index and heap with the first one with the RID lookup and use a different index and CCI on the other, that is back end magic to me. My best guess is the CCI doesn't have all of the columns in it that it needs for the query, so it needs to do a lookup on disk with the RID version. But with the CCI version, I would expect it to use your new index for both lookups. My best guess is that it is thinking that your index is close to the same size as the CCI so it thinks looking through the CCI will be faster.
The above is all just my opinion on what you should do. 
As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply