• Thanks for your suggestions. Unfortunately i can't provide/share any execution plans as we operate from an secure environment.

    Answers to the questions posed

    ->There are no blocking, CPU and OS levels are at normal.

    ->Cost threshold for parallelism is set at 5 and Max degree of parallelism is 0 across our SQL estate

    ->Data differs by some rows from QA(1899729) to Prod(1882943) only in 1 table

    ->I observed that in QA, in execution plan the clustered index is scanning the entire table,i.e., actual no.of rows is showing as 1899729. But in PROD, it is just 192 rows, which is the final output of SP. I wonder how would optimizer knows the records to output at the first stage.

    ->In PROD execution plan, Clustered Index scan for all the 3 tables are less than 10% but HASH MATCH (FLOW DISTINCT) is at 74%, where as this doesn't show up in QA execution plan.

    What i did

    -> Refreshed PROD data onto a Test server

    ->Rebuild indexes on those 3 tables, updated statistics

    ->Captured the plan_handle and ran cleaned it up using DBCC FREEPROCCACHE(plan_handle)

    ->I executed DBCC SHOW_STATISTICS and see that rows sampled in TEST is 86462 and in QA it is 88397

    -> Still the SP is taking longer times

    Hope this would give some picture and your advise can help me in fixing this.

    Thanks,

    Dan