I think the key thing to know about any data mining is the differences between causation, correlation, and coincidence. It doesn't matter how good your technical skills are on the subject, if you can't spot those.
After that, learn how to judge data quality. There are ten or twelve major issues you'll find in data quality, regardless of the tools you use or the techniques you use them with, that will cause data mining to fail or produce false results if you don't know them thoroughly. You have to be able to spot the classical patterns like dropped out time, contrary facts, et al, without hesitation. Converse for the positive data quality metrics. You need to know those just as well.
After that, it's just all about the tools. Those will vary, and in a shop that's just moving into the field you'll probably be able to define what you want instead of having to learn legacy tools. That puts you in the driver's seat on that point.
But no tool available can make up for mistaking coincidence for cause or missing that a datum is from the wrong time period to be applicable, for example.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon