[CSHARP-1525] JsonBuffer can exhaust memory Created: 06/Jan/16  Updated: 31/Mar/22

Status: Backlog
Project: C# Driver
Component/s: Json
Affects Version/s: 2.2
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Jeffrey Yemin Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

The JsonBuffer class, when wrapping a TextReader, will consume memory equal to the total number of characters consumed from the TextReader. This could cause problems in several scenarios:

  • A mongoimport-like scenario, where a single file contains many JSON documents and a single JsonReader is used to read them all
  • A really large JSON document (though in most situations you'd need space proportional to the size of the document anyway)


 Comments   
Comment by Robert Stam [ 15/Feb/21 ]

I've tried to verify the issues described.

  • A mongoimport-like scenario, where a single file contains many JSON documents and a single JsonReader is used to read them all

I see `ResetBuffer` called just before each top level document is read, so I don't think this scenario is a problem.

  • A really large JSON document (though in most situations you'd need space proportional to the size of the document anyway)

This is true, but probably not a serious issue. If reading a single document, the buffer will grow to be big enough to hold the entire JSON string. If reading a series of JSON documents from the same file, the buffer will grow to be as large as the largest JSON document encountered. But as soon as the JsonReader is garbage collected the associated JsonBuffer will be garbage collected also.

Comment by Sebastien Balant [ 13/Nov/17 ]

Hi,

We are encountering this problem while deserialize long streams of small documents (style IEnumerable/Observable). It is a pity as it blocks us from standardising on MongoDB.Bson for all our serializing purposes.

Could we make JsonBuffer a public class and non private property of JsonReader so we could clean the buffer after each document is deserialized?

That would allow the following (for example):

return Observable.Create<T>((obs, ctx) =>
{
    try
    {
        using (var stream = new StreamReader(readStream, encoding, false, DefaultFileStreamBufferSize, true))
        {
            using (var reader = new JsonReader(stream))
            {
                while (!cancellationToken.IsCancellationRequested)
                {
                    if (reader.State == BsonReaderState.Initial)
                    {
                        reader.ReadStartArray();
                        continue;
                    }
                    if (reader.State == BsonReaderState.EndOfArray)
                    {
                        reader.ReadEndArray();
                        break;
                    }
 
                    if (reader.State == BsonReaderState.Value)
                    {
                        var item = BsonSerializer.Deserialize<T>(reader);
                        obs.OnNext(item);
                        reader.Buffer.ResetBuffer();
                    }
 
                    if (reader.State == BsonReaderState.Type)
                    {
                        reader.ReadBsonType();
                    }
                }
            }
        }
    }
    finally
    {
        obs.OnCompleted();
    }
    return Task.FromResult(readStream);
});

Generated at Wed Feb 07 21:39:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.