I found the exact opposite on my dev enviornment. The CTE procs out performed the loop procedures. I used the scripts and ran through all the test scenarios and found the CTE ran 50% faster than the loop. I'm running Windows 2003 Enterprise Edition, SQL 2005 Enterprise Edition SP2, 4 Dual Core (3.0 GHZ) Processors and 2 GB RAM.
I also use Recursive CTEs in a real production environment (3.2 TB Database) on 200 million and almost Billion row tables with a hierachly data structure and many nodes. I've never had an issue with CTE performance as long as indexes are tuned properly. I would not shy away from using CTE just because of this article.