• Finally it looks like your posts are back Jason, and good job on the sample and test code. The addition of the index does remove the primary sort but optimizer incorporates a sort operator (as usual) for the gathering of the parallel streams as it considers the operation costly enough to go for a parallel plan. Rounding up the IO and CPU, the ration between the two solutions is close to 85/15. The downside is that the results my solution isn't correct although I feel it wouldn't take much to correct it, I just don't have the time at the moment.

    😎