[CSHARP-666] Add support for lazy deserialization of BsonDocuments Created: 24/Jan/13  Updated: 20/Mar/14  Resolved: 19/Feb/13

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.7
Fix Version/s: 1.8

Type: New Feature Priority: Minor - P4
Reporter: Robert Stam Assignee: Robert Stam
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Add support for lazy and raw subclasses of BsonDocument and BsonArray:

BsonValue
    BsonDocument : BsonValue
        LazyRawBsonDocument : BsonDocument
        RawBsonDocument : BsonDocument
    BsonArray : BsonValue
        LazyBsonArray : BsonArray
        RawBsonArray : BsonArray

In all cases when a lazy/raw document or array is deserialized the raw bytes will saved without any further deserialization at that time. If the lazy/raw document is later serialized then the raw bytes can be written back out without any need to reserialize them. This can be a huge performance win when copying documents from one place to another.

Since both LazyBsonDocument and RawBsonDocument derive from BsonDocument you will be able to use them anywhere a BsonDocument can be used. In particular, you will be able to access the elements of a lazy/raw document.

The two classes will differ in how they handle accessing the elements (and therefore in their performance characteristics).

A LazyBsonDocument will immediately deserialize one level deep as soon as you access the document contents in any way. Any embedded documents (and arrays) will be lazy themselves and will not be deserialized unless (and until) you attempt to access their contents. Once a level has been deserialized it will essentially become a normal BsonDocument and there will be essentially no performance difference. If you access every part of the document the end result will be that the whole document has been deserialized, just that it has been done in a lazy fashion one level at a time as you accessed different parts of the document. If you are going to access the entire document you might as well use a regular BsonDocument and deserialize the whole thing up front, but if you only need to access some parts of the document a LazyBsonDocument could be a big performance win.

A RawBsonDocument always keeps the raw bytes representation of the document. You can still access any part of the document, but unlike a LazyBsonDocument, this will not trigger any permanent deserialization. Only the one element you access will be deserialized. If you access that element again in the future, it will have to be deserialized again. This representation is beneficial when you only need to access very few fields before sending the document on somewhere else, so by not triggering a permanent deserialization the document doesn't have to ever be reserialized again. Note that a RawBsonDocument is immutable, so it can only be used when you want to send the document on unmodified.

Here's some sample code using a LazyBsonDocument:

// sample code using LazyBsonDocument
var source = database.GetCollection<LazyBsonDocument>("source");
var destination = database.GetCollection<LazyBsonDocument>("destination");
foreach (var document in source.FindAll())
{
    document["timestamp"] = DateTime.UtcNow; // triggers one level of deserialization (note that document is mutable)
    destination.Insert(document); // only the top level needs to be reserialized
}

and some sample code using a RawBsonDocument:

// sample code using RawBsonDocument with output to a file
var source = database.GetCollection<RawBsonDocument>("source");
var destination = File.Create("destination.bson"); // destination could be a socket instead
foreach (var document in source.FindAll())
{
    // note that this code accesses the "export" and "_id" elements of the RawBsonDocument
    if (document["export"].ToBoolean())
    {
        destination.Write(document.Bytes, 0, document.Bytes.Length); // no reserialization required
        source.Update(Query.EQ("_id", document["_id]), Update.Set("export", false)); // clear export flag
    }
}



 Comments   
Comment by auto [ 19/Feb/13 ]

Author:

{u'date': u'2013-02-12T04:36:12Z', u'name': u'rstam', u'email': u'robert@10gen.com'}

Message: CSHARP-666: Added lots of unit tests and fixed a few bugs.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/b943820d1b144445cf093bfcec788d6c3b620d19

Comment by auto [ 10/Feb/13 ]

Author:

{u'date': u'2013-02-10T23:01:56Z', u'name': u'rstam', u'email': u'robert@10gen.com'}

Message: CSHARP-666: Initial implementation of lazy and raw BsonDocument and BsonArray.
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/a44149564d2a50fac370cbbd37afa8655929f65d

Generated at Wed Feb 07 21:37:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.