cross join sql

Question

Post reply

cross join sql

Piyushbhatt

Old Hand

Points: 374

April 11, 2023 at 10:58 am

#4172212

I'm attempting to use INNER JOIN to join numerous tables.

Here is the code:

IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL 
     DROP TABLE #tmpRecData

--STEP 1
SELECT DISTINCT 
    pr.ChainID, pr.StoreID, pr.SupplierID, pr.ProductID, 
    MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
    --ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO 
    #tmpRecData 
FROM 
    dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN 
    ProductIdentifiers iden ON pr.ProductID = iden.ProductID
                            AND iden.ProductIdentifierTypeID = 2
WHERE 
    pr.ChainID = '129121'
    AND pr.ActiveLastDate > '12/01/2016'                  
GROUP BY 
    pr.ProductID, pr.ProductName, iden.IdentifierValue, 
    pr.ChainID, pr.StoreID, pr.SupplierID

--STEP 2
SELECT 
    rec.ChainID, rec.StoreID, rec.SupplierInvoiceNumber,
    rec.TransactionTypeID, rec.SupplierID, rec.SaleDateTime,
    rec.ProductID, rec.UPC, rec.ProductDescriptionReported, 
    rec.RawProductIdentifier 
FROM 
    #tmpRecData t
INNER JOIN 
    dbo.StoreTransactions AS rec WITH (NOLOCK) ON rec.ChainID = T.ChainID 
WHERE 
    rec.ChainID = '129121'

DROP TABLE #tmpRecData

I receive 4096 rows (Step 1) * 14979 rows (Step 2) = 725007693 rows (725 million)

This is a large amount of records, yet I used INNER JOIN; therefore, why did it work as CROSS JOIN?

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Ant-Green SSC Guru Points: 113559 More actions · Answer 1

The joining clause in step 2 needs to be more selective.

rec.ChainID = T.ChainID will there be a CROSS JOIN as each row in step 1 relates to each and every row in step 2

Thom A SSC Guru Points: 99141 More actions · Answer 2

Why are you using DISTINCT and GROUP BY in your first query out of interest? GROUP BY already puts your data into distinct groups, so the DISTINCT in the SELECT would do little more than cause unnecessary overhead.

I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format; '12/01/2016' could be 12 January 2016 or 01 December 2016. yyyyMMdd would be a better format (such as '20161201' for 12 January 2016).

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

drew.allen SSC Guru Points: 77012 More actions · Answer 3

Both tables contain a chain id, store id, supplier id, and product id. I would expect that you would want to include EACH of those columns in your join criteria.

I also notice that you haven't used ANY information from your temp table in your second query. You aren't including any fields from that table and you aren't filtering on any fields either. Why is it even in your second query?

Finally, using the (NOLOCK) hint can produce dirty data. You should only use this if you don't care if there are inconsistencies in your data.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

drew.allen SSC Guru Points: 77012 More actions · Answer 4

Thom A wrote:

Why are you using DISTINCT and GROUP BY in your first query out of interest? GROUP BY already puts your data into distinct groups, so the DISTINCT in the SELECT would do little more than cause unnecessary overhead.
I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format; '12/01/2016' could be 12 January 2016 or 01 December 2016. yyyyMMdd would be a better format (such as '20161201' for 12 January 2016).

The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause. What you said about DISTINCT is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Thom A SSC Guru Points: 99141 More actions · Answer 5

drew.allen wrote:

The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause. What you said about DISTINCT is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.

Yes, that's sort of the point. If a query has both a GROUP BY and DISTINCT in the same scope this is a good indicator that either the DISTINCT is pointless or the GROUP BY is wrong; it appears the latter is the case here.

The OP is grouping in the column ProductName but that isn't in their SELECT. THough I would expect that ProductName would be unique for a single value of ProductID so I suspect that the inclusion of the column in the GROUP BY isn't creating "duplicate" rows. I don't know what IdentifierValue is, and I suspect that it should not be in the GROUP BY at all.

TL;DR: Remove ProductName, IdentifierValue from the GROUP BY and remove the DISTINCT.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

drew.allen SSC Guru Points: 77012 More actions · Answer 6

Thom A wrote:

drew.allen wrote:
The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause. What you said about DISTINCT is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.
Yes, that's sort of the point. If a query has both a GROUP BY and DISTINCT in the same scope this is a good indicator that either the DISTINCT is pointless or the GROUP BY is wrong; it appears the latter is the case here.
The OP is grouping in the column ProductName but that isn't in their SELECT. THough I would expect that ProductName would be unique for a single value of ProductID so I suspect that the inclusion of the column in the GROUP BY isn't creating "duplicate" rows. I don't know what IdentifierValue is, and I suspect that it should not be in the GROUP BY at all.
TL;DR: Remove ProductName, IdentifierValue from the GROUP BY and remove the DISTINCT.

I suspect that IdentifierValue is a generated ID field that is essentially an alternate to the unique combination of SupplierID-ProductID.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

Piyushbhatt Old Hand Points: 374 More actions · Answer 7

Piyushbhatt

Old Hand

Points: 374

April 13, 2023 at 8:41 pm

#4173691

Well thank your help and time everyone