Just to note: SQL 2008 scheduler does not reguard NUMA for thread assignment in parallelism. The thread has a chance of running on the same node, or it may run on a seperate node. Linchi Shea had a pretty good article about this
With that being said, Linchi also has a follow up article
that states that it doesn't appear to make a big difference to performance relative to other performance isues.
I find it interesting that the last article mentions having SQL Server listener set up to listen on multiple ports. I've been doing some tinkering around with Soft NUMA. There does seem to be some benefits with creating affinity with multiple NICs listening on seperate subnets, each NIC affinitized to one core, and having SQL Server listener port mapped to the same core (soft NUMA), and loading tables in parallel for ETL. Basically Microsoft did a case study
with this where they loaded 1Tb of data into SQL Server in about 30 mins using commodity class hardware.
However I would think such a configuration would be ultimately difficult to support in a production environment, as in order to make such a thing work correctly, you'd have to have soft NUMA set up (which isn't painless), then have the listener listen on seperate ports, one for each soft NUMA node affinitized to each physical NIC, affinitized to each CPU core. Then in your ETL package you'd have to use something like the Balanced Data Distributor and multiple connections to SQL Server set up on seperate NIC subnets on your SSIS machine to make sure the traffic gets split and evenly balanced down each parallel stream.
Then you'd have to have a set of NICs set up to listen on a single port configured on another subnet for all the NUMA nodes so you could properly manage end user requests or you risk overloading one of the soft NUMA nodes. I don't think you'd want to mess with using a load balancer to balance the request across the nodes respectively for end-user activity. At the end of the day, thats a lot of NICs going into SQL Server, and a lot of configuration to the SQL listener
to map TCP ports to soft NUMA nodes, each soft NUMA node mapped to a CPU core... You see where I'm going with this. Configuration management becomes quite a nightmare.
Or you could just buy PDW 2012 and get all that out of the box... Choices choices.