Thanks guys. Luiz's code doesn't complete for me when I run it as is, but when I make my loop join change (which seemed to be the larger problem which led to the query just not completing), it completes within 375ms, which is quicker than the 388ms my originally fixed code ran for. So reducing the rows further up the tally table has had some benefit, I will now spend some time getting my head around what was done 🙂
Edit: Just noticed the CPU time has gone from 501ms to 328ms, significant win there 🙂