• Do you have the same amount of data in test as in production? If not, your testing isn't going to find volume based performance issues, nor will you be able to do accurate tuning in test.

    For tuning, first update your statistics and do basic tuning (use nothing you don't need, join to the least number of rows possible using sargable expressions, etc.), then rewrite your query several ways - temp tables, temp tables with indexes, table variables, table variables with UNIQUE constraints and primary keys, derived tables, etc., and watch the CPU, reads, writes, and duration in SQL Profiler. Try other ways of writing it entirely. Watch the execution plan - sometimes SQL Server, even modern versions, chooses completely the wrong index.

    Know enough about your system (or ask someone who does, or find out) to know if you're IO constrained, CPU constrained, or ?? constrained on your production system; weigh that factor more heavily. For instance, on many of my IO constrained systems, I'm delighted to see a 30% CPU increase in exchange for a 5x read decrease, and on those with very light CPU, I'm happy to see a 30% CPU increase in exchange for halving the number of reads.

    Every environment is different.