People fail to realize how intimately human speech and language is related to the human brain and physiology. It's not just an information transfer, far more information is not said than actually is. Complex levels of thought structure are build within the listener's mind to approximately match structures in the speaker's mind and this is facilitated by the organization of language itself.
Not just any mathematically possible language makes a usable human language. We are born with complex structures that expect various characteristics of language (parts of speech, organization of words, modifiers etc) and these are not random. As different as human languages appear to be, they are actually far more similar underneath than they are different.
The specialization goes beyond that. There are only a limited number of sounds used in all the world's languages. These are the sounds that our brains are programmed to recognize, with minimal confusion from background noise, variations in speaker voice etc. Most languages use most of them, but not all of them. Young children become adept at hearing (as well as pronouncing) the sounds in their language environment; the one's they are not exposed to often atrophy. There are sound intonations in Vietnamese, for example that adult English speakers often cannot even hear, much less pronounce. There are sounds in English that confuse many foreign speakers.
VR is not just pattern matching. A VR system needs to identify these phonems consistently from different speakers and voices, against background noise, colds, conflicting conversations etc. Additionally it requires at least a basic understanding of meaning before it can become reliable.
We are a long way from that.
-- FORTRAN manual for Xerox Computers --