• Depends on how much concurrency you want to be able to support. Looking to load a lot of dimension tables simultaneously? Or are you looking just to run one massive process at a time?

    Personally, I would follow the recommendations. Especially the NUMA related ones.

    Do you realy want to allow a single process to take over the entire CPU power? Do want to spend a large chunk of time in CXPacket waits for little gain?

    Parallelism should be the answer of last resort. Only after you have done everything else optimally should you be looking to parallelism to improve performance.