1) 5 is universally a bad number for cost threshold for parallelism. You get WAY too many parallel queries, especially in OLTP scenarios.
2) MAXDOP on the server should almost NEVER be zero and should NEVER be greater than the number of physical cores in each NUMA node (most modern hardware is NUMA).
3) Outside of those two, the vast majority of CXPACKET waits I see is from poor indexing and wrong datatypes and functions around columns. Each lead in their own way to bad query plans and/or reading/hashing a LOT more data than is actually required.
4) A few hours or few days with a good professional tuning consultant will likely provide you with many man-months worth of work to address a wide variety of issues.
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service