The driver spends a lot of time decoding UTF8 strings, which is expensive.
We currently use a Trie in BsonClassMapSerializer to avoid decoding the element names (it also avoids the dictionary lookup for the member map information).
Investigate using a Trie to speed up UTF8 decoding of element names in general. The idea is to have some form of Trie-based LRU cache of recently seen element names. The number of strings to be cached should probably be configurable. As long as the hit ratio is high there should be a significant speed up. Even just a few megabytes dedicated to this should yield very high hit ratios.
Most likely there would be a single global Trie holding decoded UTF8 strings, but we may want to be able to configure at the collection level whether it would be used or not (for example, you might want to exclude a collection known to have a very large number of element names for some reason).