April 11, 2023 at 10:58 am
I'm attempting to use INNER JOIN to join numerous tables.
Here is the code:
IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL
DROP TABLE #tmpRecData
--STEP 1
SELECT DISTINCT
pr.ChainID, pr.StoreID, pr.SupplierID, pr.ProductID,
MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
--ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO
#tmpRecData
FROM
dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN
ProductIdentifiers iden ON pr.ProductID = iden.ProductID
AND iden.ProductIdentifierTypeID = 2
WHERE
pr.ChainID = '129121'
AND pr.ActiveLastDate > '12/01/2016'
GROUP BY
pr.ProductID, pr.ProductName, iden.IdentifierValue,
pr.ChainID, pr.StoreID, pr.SupplierID
--STEP 2
SELECT
rec.ChainID, rec.StoreID, rec.SupplierInvoiceNumber,
rec.TransactionTypeID, rec.SupplierID, rec.SaleDateTime,
rec.ProductID, rec.UPC, rec.ProductDescriptionReported,
rec.RawProductIdentifier
FROM
#tmpRecData t
INNER JOIN
dbo.StoreTransactions AS rec WITH (NOLOCK) ON rec.ChainID = T.ChainID
WHERE
rec.ChainID = '129121'
DROP TABLE #tmpRecDataI receive 4096 rows (Step 1) * 14979 rows (Step 2) = 725007693 rows (725 million)
This is a large amount of records, yet I used INNER JOIN; therefore, why did it work as CROSS JOIN?
April 11, 2023 at 11:06 am
The joining clause in step 2 needs to be more selective.
rec.ChainID = T.ChainID will there be a CROSS JOIN as each row in step 1 relates to each and every row in step 2
April 11, 2023 at 11:50 am
Why are you using DISTINCT and GROUP BY in your first query out of interest? GROUP BY already puts your data into distinct groups, so the DISTINCT in the SELECT would do little more than cause unnecessary overhead.
I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format; '12/01/2016' could be 12 January 2016 or 01 December 2016. yyyyMMdd would be a better format (such as '20161201' for 12 January 2016).
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
April 11, 2023 at 1:48 pm
Both tables contain a chain id, store id, supplier id, and product id. I would expect that you would want to include EACH of those columns in your join criteria.
I also notice that you haven't used ANY information from your temp table in your second query. You aren't including any fields from that table and you aren't filtering on any fields either. Why is it even in your second query?
Finally, using the (NOLOCK) hint can produce dirty data. You should only use this if you don't care if there are inconsistencies in your data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 11, 2023 at 1:52 pm
Why are you using
DISTINCTandGROUP BYin your first query out of interest?GROUP BYalready puts your data into distinct groups, so theDISTINCTin theSELECTwould do little more than cause unnecessary overhead.I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format;
'12/01/2016'could be 12 January 2016 or 01 December 2016.yyyyMMddwould be a better format (such as'20161201'for 12 January 2016).
The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause. What you said about DISTINCT is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 11, 2023 at 2:00 pm
The fields listed in the
SELECTclause are a subset of the fields in theGROUP BYclause. What you said aboutDISTINCTis only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.
Yes, that's sort of the point. If a query has both a GROUP BY and DISTINCT in the same scope this is a good indicator that either the DISTINCT is pointless or the GROUP BY is wrong; it appears the latter is the case here.
The OP is grouping in the column ProductName but that isn't in their SELECT. THough I would expect that ProductName would be unique for a single value of ProductID so I suspect that the inclusion of the column in the GROUP BY isn't creating "duplicate" rows. I don't know what IdentifierValue is, and I suspect that it should not be in the GROUP BY at all.
TL;DR: Remove ProductName, IdentifierValue from the GROUP BY and remove the DISTINCT.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
April 11, 2023 at 2:08 pm
drew.allen wrote:The fields listed in the
SELECTclause are a subset of the fields in theGROUP BYclause. What you said aboutDISTINCTis only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.Yes, that's sort of the point. If a query has both a
GROUP BYandDISTINCTin the same scope this is a good indicator that either theDISTINCTis pointless or theGROUP BYis wrong; it appears the latter is the case here.The OP is grouping in the column
ProductNamebut that isn't in theirSELECT. THough I would expect thatProductNamewould be unique for a single value ofProductIDso I suspect that the inclusion of the column in theGROUP BYisn't creating "duplicate" rows. I don't know whatIdentifierValueis, and I suspect that it should not be in theGROUP BYat all.TL;DR: Remove
ProductName,IdentifierValuefrom theGROUP BYand remove theDISTINCT.
I suspect that IdentifierValue is a generated ID field that is essentially an alternate to the unique combination of SupplierID-ProductID.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 13, 2023 at 8:41 pm
Well thank your help and time everyone
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply