[CSHARP-944] Investigate using a Trie to speed up UTF8 decoding of element names Created: 01/Apr/14 Updated: 16/Jun/14 Resolved: 13/Jun/14 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | BSON |
| Affects Version/s: | 2.0 |
| Fix Version/s: | 2.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Robert Stam | Assignee: | Robert Stam |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
|
The driver spends a lot of time decoding UTF8 strings, which is expensive. We currently use a Trie in BsonClassMapSerializer to avoid decoding the element names (it also avoids the dictionary lookup for the member map information). Investigate using a Trie to speed up UTF8 decoding of element names in general. The idea is to have some form of Trie-based LRU cache of recently seen element names. The number of strings to be cached should probably be configurable. As long as the hit ratio is high there should be a significant speed up. Even just a few megabytes dedicated to this should yield very high hit ratios. Most likely there would be a single global Trie holding decoded UTF8 strings, but we may want to be able to configure at the collection level whether it would be used or not (for example, you might want to exclude a collection known to have a very large number of element names for some reason). |
| Comments |
| Comment by Robert Stam [ 13/Jun/14 ] |
|
We have decided not to do this because benchmarking has proven that the small amount of CPU time saved by not doing UTF8 decoding is not enough to amortize the cost of managing a cache. |
| Comment by Robert Stam [ 01/Apr/14 ] |
|
One thing to consider is that when a Dictionary<string, TSomeValue> is serialized using the default representation of a BSON document where the keys become element names, that will result in a potentially large number of element names. |