• Thanks for all the comments and suggestions.

    My approach was as such:

    1. Run Server Side Tracing for 15 mins on 5 almost identical dedicated sql servers (Subscribers in Transactional replication). This resulted in ~35 executions of the stored procedure in question

    2. Change sp to its new (optimized) version

    3. Run Server Side Tracing for 15 mins again, similarly

    On 4 out of 5, there was a drastic improvement in both - average duration and average reads. I think 5th one had some IO issues at the time, so we ignored it.

    The change was basically an INNER JOIN order. I had 3 tables in a query, and there were 2 ways to JOIN between them. When the query was originally written , I did not pay attention to it. I just chose one way randomly. However, as I found out now, joining another would would allow to use an INDEX on one of the biggest tables of the 3, thereby totally changing the execution plan, bringing down the sub-tree cost from 3.5 to 0.1 and the logical reads from 110,000 to less than 10,000.

    Thank you!