I would like to propose a couple of simple code changes that reduces amount of allocations, reducing GC load and increasing performance.
In one of our systems (an ASP.NET application with ~6GiB working set) the described sources of allocation account for ~2.2% of objects created.
The first source of unnecessary allocations is the BsonSerializerRegistry.GetSerializer(Type) method. It uses the compiler-provided method group -> delegate conversion as concurrent dictionary object factory. Unfortunately, current compilers do not cache the created delegate, recreating it on every call (https://github.com/dotnet/roslyn/issues/5835). The suggested fix is simple – caching the delegate in member variable.
The second source is a bit more complex. The BsonWriter allows for custom element name validation rules that potentially depend on the current element name. To implement this, the element name validator factory is represented as a delegate, that is changed when the associated state (stack/name) is changed. But, all three delegates (in PopElementNameValidator, PushElementNameValidator and WriteStartDocument) refer to an instance member (and even to a local variable) which makes them ineligible for caching. The proposed improvement replaces delegate-based validator creation with a more direct one.
The benchmark code is available at https://github.com/onyxmaster/mongobench.