[CSHARP-1065] BsonDocument should postpone creating internal dictionary until document grows large enough Created: 16/Sep/14  Updated: 02/Apr/15  Resolved: 20/Nov/14

Status: Closed
Project: C# Driver
Component/s: BSON
Affects Version/s: None
Fix Version/s: 2.0

Type: Improvement Priority: Minor - P4
Reporter: Robert Stam Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

BsonDocument has two internal fields for its implementation:

private List<BsonElement> _elements;
private Dictionary<string, int> _indexes; // maps names to positions

Considering that for small documents a linear search of the _elements List can be faster than a Dictionary lookup, we could postpone creating the dictionary until the document grew large enough to make it worthwhile, and fall back to linear searches when the dictionary doesn't yet exist. This would both save memory and be faster for smaller documents.

We would have to experiment to find the threshold at which to create the dictionary. A similar class in .NET (HybridDictionary) uses a threshold of 8, so that gives us an idea of the approximate value for the threshold.



 Comments   
Comment by Robert Stam [ 20/Nov/14 ]

Here is some performance data that confirms how much faster BsonDocument
is after this change, and which also suggests that 8 was a pretty good guess for
the threshold for the size at which BsonDocuments begin to benefit from a
dictionary.

In the tests, "creating" a document means instantiating a new BsonDocument and
adding elements to it until it reaches the desired document size, and
"processing" a document means accesing each element from the document once.

For each document size below, the first two thresholds result in dictionaries
being created (for 0 it is created when the first element is inserted, for
threshold == documentSize it is created when the last element is inserted).

The last two thresholds both result in NO dictionary being created, so therefore
the last two tests really should perform the same. The differences between the
last two tests are just the vagaries of performance benchmarking I guess.

Notice also the remarkable difference in the number of garbage collections
required when using dictionaries. Yet another reason (besides speed) that this
change seems very promising.

Data below:

Benchmarks for document size: 0
 
Threshold  0:  57818808 created per second 83 gcs
Threshold  0:  59418836 created per second 83 gcs
Threshold  1:  59565682 created per second 83 gcs
Threshold  2:  42384393 created per second 83 gcs
 
Threshold  0: 319199193 processed per second 0 with gcs
Threshold  0: 515602120 processed per second 0 with gcs
Threshold  1: 522048728 processed per second 0 with gcs
Threshold  2: 528546813 processed per second 0 with gcs
 
Benchmarks for document size: 1
 
Threshold  0:   7036394 created per second 450 gcs
Threshold  1:   6040069 created per second 450 gcs
Threshold  2:  14779368 created per second 198 gcs
Threshold  3:  14940896 created per second 198 gcs
 
Threshold  0:  30650401 processed per second 0 with gcs
Threshold  1:  20899405 processed per second 0 with gcs
Threshold  2: 103338238 processed per second 0 with gcs
Threshold  3: 103355220 processed per second 0 with gcs
 
Benchmarks for document size: 2
 
Threshold  0:   4017379 created per second 503 gcs
Threshold  2:   4346494 created per second 503 gcs
Threshold  3:   7391546 created per second 251 gcs
Threshold  4:   7606740 created per second 251 gcs
 
Threshold  0:  15584482 processed per second 0 with gcs
Threshold  2:  12532214 processed per second 0 with gcs
Threshold  3:  41841336 processed per second 0 with gcs
Threshold  4:  41998595 processed per second 0 with gcs
 
Benchmarks for document size: 3
 
Threshold  0:   2775701 created per second 556 gcs
Threshold  3:   3048547 created per second 556 gcs
Threshold  4:   5104758 created per second 305 gcs
Threshold  5:   5062392 created per second 305 gcs
 
Threshold  0:   9078315 processed per second 0 with gcs
Threshold  3:   9306934 processed per second 0 with gcs
Threshold  4:  21748898 processed per second 0 with gcs
Threshold  5:  21711264 processed per second 0 with gcs
 
Benchmarks for document size: 4
 
Threshold  0:   1730639 created per second 923 gcs
Threshold  4:   1793944 created per second 923 gcs
Threshold  5:   3536455 created per second 358 gcs
Threshold  6:   3574510 created per second 358 gcs
 
Threshold  0:   7159495 processed per second 0 with gcs
Threshold  4:   7049152 processed per second 0 with gcs
Threshold  5:  13249848 processed per second 0 with gcs
Threshold  6:  11403308 processed per second 0 with gcs
 
Benchmarks for document size: 5
 
Threshold  0:   1333980 created per second 1068 gcs
Threshold  5:   1351232 created per second 1068 gcs
Threshold  6:   2230344 created per second 503 gcs
Threshold  7:   2267379 created per second 503 gcs
 
Threshold  0:   5486053 processed per second 0 with gcs
Threshold  5:   5683303 processed per second 0 with gcs
Threshold  6:   7779932 processed per second 0 with gcs
Threshold  7:   8789969 processed per second 0 with gcs
 
Benchmarks for document size: 6
 
Threshold  0:   1176262 created per second 1121 gcs
Threshold  6:   1102858 created per second 1121 gcs
Threshold  7:   1757917 created per second 556 gcs
Threshold  8:   1762739 created per second 556 gcs
 
Threshold  0:   4580784 processed per second 0 with gcs
Threshold  6:   4663616 processed per second 0 with gcs
Threshold  7:   5933954 processed per second 0 with gcs
Threshold  8:   5692639 processed per second 0 with gcs
 
Benchmarks for document size: 7
 
Threshold  0:   1021756 created per second 1174 gcs
Threshold  7:    930328 created per second 1174 gcs
Threshold  8:   1382554 created per second 610 gcs
Threshold  9:   1386193 created per second 610 gcs
 
Threshold  0:   3831217 processed per second 0 with gcs
Threshold  7:   3863005 processed per second 0 with gcs
Threshold  8:   4077987 processed per second 0 with gcs
Threshold  9:   4342168 processed per second 0 with gcs
 
Benchmarks for document size: 8
 
Threshold  0:    799136 created per second 1922 gcs
Threshold  8:    720602 created per second 1922 gcs
Threshold  9:   1148101 created per second 663 gcs
Threshold 10:   1130764 created per second 663 gcs
 
Threshold  0:   3452324 processed per second 0 with gcs
Threshold  8:   3509871 processed per second 0 with gcs
Threshold  9:   3258819 processed per second 0 with gcs
Threshold 10:   3151490 processed per second 0 with gcs
 
Benchmarks for document size: 9
 
Threshold  0:    687596 created per second 2128 gcs
Threshold  9:    584144 created per second 2128 gcs
Threshold 10:    889409 created per second 869 gcs
Threshold 11:    887905 created per second 869 gcs
 
Threshold  0:   3036981 processed per second 0 with gcs
Threshold  9:   3033536 processed per second 0 with gcs
Threshold 10:   2553074 processed per second 0 with gcs
Threshold 11:   2572368 processed per second 0 with gcs
 
Benchmarks for document size: 10
 
Threshold  0:    652932 created per second 2181 gcs
Threshold 10:    520942 created per second 2181 gcs
Threshold 11:    751156 created per second 923 gcs
Threshold 12:    755219 created per second 923 gcs
 
Threshold  0:   2749746 processed per second 0 with gcs
Threshold 10:   2759409 processed per second 0 with gcs
Threshold 11:   1985564 processed per second 0 with gcs
Threshold 12:   2082270 processed per second 0 with gcs
 
Benchmarks for document size: 11
 
Threshold  0:    603092 created per second 2235 gcs
Threshold 11:    454494 created per second 2235 gcs
Threshold 12:    651894 created per second 976 gcs
Threshold 13:    645493 created per second 976 gcs
 
Threshold  0:   2386339 processed per second 0 with gcs
Threshold 11:   2603947 processed per second 0 with gcs
Threshold 12:   1688600 processed per second 0 with gcs
Threshold 13:   1667085 processed per second 0 with gcs
 
Benchmarks for document size: 12
 
Threshold  0:    571551 created per second 2288 gcs
Threshold 12:    412987 created per second 2288 gcs
Threshold 13:    572964 created per second 1029 gcs
Threshold 14:    570698 created per second 1029 gcs
 
Threshold  0:   2371917 processed per second 0 with gcs
Threshold 12:   2456998 processed per second 0 with gcs
Threshold 13:   1414516 processed per second 0 with gcs
Threshold 14:   1357282 processed per second 0 with gcs

Comment by Githook User [ 20/Nov/14 ]

Author:

{u'username': u'rstam', u'name': u'rstam', u'email': u'robert@robertstam.org'}

Message: CSHARP-1065: Changed BsonDocument to postpone creating the indexes dictionary until there are enough elements to justify it.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/5936555e12212cb3c3c107c8224a9dfecfe36cd9

Generated at Wed Feb 07 21:38:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.