[CSHARP-944] Investigate using a Trie to speed up UTF8 decoding of element names Created: 01/Apr/14  Updated: 16/Jun/14  Resolved: 13/Jun/14

Status: Closed
Project: C# Driver
Component/s: BSON
Affects Version/s: 2.0
Fix Version/s: 2.0

Type: Improvement Priority: Major - P3
Reporter: Robert Stam Assignee: Robert Stam
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The driver spends a lot of time decoding UTF8 strings, which is expensive.

We currently use a Trie in BsonClassMapSerializer to avoid decoding the element names (it also avoids the dictionary lookup for the member map information).

Investigate using a Trie to speed up UTF8 decoding of element names in general. The idea is to have some form of Trie-based LRU cache of recently seen element names. The number of strings to be cached should probably be configurable. As long as the hit ratio is high there should be a significant speed up. Even just a few megabytes dedicated to this should yield very high hit ratios.

Most likely there would be a single global Trie holding decoded UTF8 strings, but we may want to be able to configure at the collection level whether it would be used or not (for example, you might want to exclude a collection known to have a very large number of element names for some reason).



 Comments   
Comment by Robert Stam [ 13/Jun/14 ]

We have decided not to do this because benchmarking has proven that the small amount of CPU time saved by not doing UTF8 decoding is not enough to amortize the cost of managing a cache.

Comment by Robert Stam [ 01/Apr/14 ]

One thing to consider is that when a Dictionary<string, TSomeValue> is serialized using the default representation of a BSON document where the keys become element names, that will result in a potentially large number of element names.

Generated at Wed Feb 07 21:38:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.