Except for two foreign key errors (you can't put FK's on Temp Tables), your attached code ran up to the final SELECT to the display as you posted it in 6 seconds on my little i5 laptop. The final SELECT for display took an additional 15 seconds.
Other than the fact that all rows in your large test table are identical (and, therefor, only clustered index scans, which are really table scans, are the rule of the day), I'm not seeing a performance issue with the code. If it's taking substantially longer for the code to run insitu on your boxes, I'd start looking into the hardware and issues like network latency.
--Jeff Moden
Change is inevitable... Change for the better is not.