3 and 4 should be identical, both in cost and results. While the optimiser doesn't optimise the distincts out, it appears to use them to make the final union more efficient.
I did some really, really rough tests and the two had the same costs and the same CPU time in the limited examples that I checked.
As for your colleague, take his code and look at the execution stats (duration and CPU time) with and without the distincts. If there's a significant difference then you have grounds to request him remove them. If there aren't, well...
Cheers Gila. Curiously, I did some really rough and ready tests prior to posting and got the opposite (SQL 2008R2 - maybe this makes a difference?):
CREATE TABLE #1 (a int)
CREATE TABLE #2 (b int)
INSERT #1 (a) VALUES (1),(2),(2),(3),(3),(3)
INSERT #2 (b) VALUES (1),(4),(4),(4),(4)
SELECT a FROM #1
SELECT b FROM #2
SELECT distincta FROM #1
SELECT distinctb FROM #2
https://www.dropbox.com/s/sofjsytdabtdzce/Exec1.png (I can't get dropboxs pics to render here so here's a link instead :() What this shows is a cost split of 66:33 in favour of removing the distincts
Similarly, In my colleague's query, the same is true - in terms of stats (IO, CPU), simpler execution plan, and simple eyeballing, the version with the distinct statements removed is faster and more efficient, so I have already convinced myself that there is a performance gain to be had
The question here is, whether anyone can think of any edge cases in which the final result set might be different with the distinct keyword included vs excluded, considering far more complex queries but basically remaining in the form <query> UNION <query> (... UNION <query> etc.)?