[CSHARP-332] read large int array Created: 28/Sep/11  Updated: 02/Apr/15  Resolved: 29/Sep/11

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.1
Fix Version/s: 1.3

Type: Improvement Priority: Major - P3
Reporter: Andrei Neagu Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 7 x64 bit, mongodb 2.0


Attachments: Text File Program.cs     File Program_2.cs    

 Description   

Reading/Writing a large number of int values (20-30 k or more) into a BsonArray takes a lot of time. The deserialization is done into a BsonDocument.



 Comments   
Comment by Robert Stam [ 29/Sep/11 ]

See the Google Groups discussion for more information about the implications of creating an index on a very large array element.

Comment by Andrei Neagu [ 29/Sep/11 ]

With 60% it should be way better Thanks.

Please have a look at http://groups.google.com/group/mongodb-user/browse_thread/thread/8cdc9d0fed6dfc23 also.

Thanks

Comment by Andrei Neagu [ 29/Sep/11 ]

can you have a test with this one? I didn't mention there is an index on that object.

Comment by Robert Stam [ 29/Sep/11 ]

Resolved issue for now. Will reopen if further discussion warrants.

Comment by Robert Stam [ 29/Sep/11 ]

In the BSON specification arrays are stored as pseudo-documents where the elements are named "0", "1", etc... The C# driver ignores these names during deserialization, and one quick optimization was to not bother doing the UTF8 decoding of these element names when they were going to be ignored anyway. Making this one small change improved performance by about 60%:

200 iterations in 00:00:03.1001773
64.5124393369373 documents/second
1935373.18010812 integers/second
Press Enter to continue

That was the only low hanging fruit though. Does this seem like enough of an improvement to you?

Comment by Robert Stam [ 29/Sep/11 ]

If I run the test program without the debugger it's considerably faster:

200 iterations in 00:00:04.9522832
40.3854125305273 documents/second
1211562.37591582 integers/second
Press Enter to continue

Note: replaced values with slightly lower numbers. Each run produces different numbers (???) and these values are closer to the median.

Comment by Robert Stam [ 29/Sep/11 ]

This is the output of the test program on my computer (your numbers will vary):

200 iterations in 00:00:05.8833365
33.9943159803965 documents/second
1019829.4794119 integers/second
Press Enter to continue

Note that while 33 documents per second sounds slow, when you look at it as deserializing over 1 million array values per second it doesn't sound so slow any more.

Nonetheless, I will run this under a profile and look for possible optimizations.

Comment by Robert Stam [ 29/Sep/11 ]

I've attached the test program I'm using to evaluate performance of reading documents with very large arrays of integers.

Generated at Wed Feb 07 21:36:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.