Be Aware of CARTESIAN PRODUCT When Using Join Keys in SQL

,

I got a massive satisfaction last week when I was able to bring down the execution time of a sql stored procedure from > 3 hours to mere 2 minutes. It all boiled down to a missing joining key that led to a cartesian product within the tables.

Here is how the basic structure of the query (there were plenty of joining columns, however for our convenience lets consider 2 columns only) looked like

Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1

And this query ran forever when given a particular set of parameters. I started with the approach of joining the parent table with each subsequent table until I met the road block. For example:-

Step1 Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
was returning rows

Step 2 
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                             
was returning rows

Step 3
Select * from
                     table1 t1 left join table2 t2 on t1.k1 =t2,k1 and t1.k2 = t2.k2
                     left join table3 t3 on t1.k1 = t3.k1 and t1.k2 = t2.k2
                     left join table4 t4 on t1.k1 = t4.k1
ran forever

As you can clearly notice, this query was missing a joining condition with column k2. On including the joining condition for column k2, query executed 1000 times quickly.

In the end, it turned out to be a bug or a blunder. But the takeaway is that whenever you have a query involving joins of multiple tables, always make sure that it does not lead to a cartesian product.

Happy coding!

Rate

Share

Share

Rate