Slow Query when perform INNER JOIN Between a very large partioned table and a non partitioned table

  • I have a very large - partitioned - table with about 50 millions records. Partition column is PeriodDate (format: YYYYMM)
    I need to perform an Inner Join between that partitioned table and a - non partitioned table - (about 3 millions rows)
    Partitioned Data is not sorted (Data could be sorted, if correct index is created, but within every single partition of the partitioned table. Not globally!)
    It seems that, in that case, the only join type available is the HASH JOIN.....
    My problem is that the query is very slow when using the HASH JOIN.
    Do you have a better option? Please help

  • karim.diouf - Friday, February 9, 2018 3:25 PM

    I have a very large - partitioned - table with about 50 millions records. Partition column is PeriodDate (format: YYYYMM)
    I need to perform an Inner Join between that partitioned table and a - non partitioned table - (about 3 millions rows)
    Partitioned Data is not sorted (Data could be sorted, if correct index is created, but within every single partition of the partitioned table. Not globally!)
    It seems that, in that case, the only join type available is the HASH JOIN.....
    My problem is that the query is very slow when using the HASH JOIN.
    Do you have a better option? Please help

    Too little information, please post the actual execution plan, the query and the full DDL for the tables in question.
    😎

  • karim.diouf - Friday, February 9, 2018 3:25 PM

    I have a very large - partitioned - table with about 50 millions records. Partition column is PeriodDate (format: YYYYMM)
    I need to perform an Inner Join between that partitioned table and a - non partitioned table - (about 3 millions rows)
    Partitioned Data is not sorted (Data could be sorted, if correct index is created, but within every single partition of the partitioned table. Not globally!)
    It seems that, in that case, the only join type available is the HASH JOIN.....
    My problem is that the query is very slow when using the HASH JOIN.
    Do you have a better option? Please help

    1) Are you doing an INNER JOIN with direct equality of matching datatypes? 

    2) 50M records is SMALL these days. I have a several client databases on my laptop with numerous tables with up to 10X that number of rows.

    3) Partitioning does NOT exist to make your queries run faster!! It exists for large-scale data handling. 

    4) What is the FULL create table statements for both tables (with indexing)?

    5) What predicate are you using in your query? If none, then it is almost certain that you will be hitting ALL data. And without appropriate SORT orders to allow a FAR more efficient MERGE JOIN you will be stuck with the HASH as the fastest way to join the data. Try forcing a loop join and see how bad that is!

    6) What is the data size of both tables? And now what is the total RAM on your server? If 1 is way bigger than 2, and you are not maxed out on RAM on your edition of SQL Server then that becomes your HIGHEST PRIORITY TASK!!! It is so important that I HAVE TURNED DOWN POTENTIAL CLIENT WORK BECAUSE OF INSUFFICIENT RAM ON SERVERS (and also for other totally unacceptable configurations). Memory runs at microsecond timescale and it shoves data into CPUs that do 3-5 BILLION operations per second on your data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply