[CSHARP-4222] General Poor Performance Created: 20/Jun/22  Updated: 27/Oct/23  Resolved: 20/Jul/22

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Unknown
Reporter: Sebastian Stehle Assignee: Dmitry Lukyanov (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File serialization_performance_for_depth_0.txt    

 Description   

I also created a topic here, just fyi: https://www.mongodb.com/community/forums/t/poor-driver-performance/170536

The scenario

I am using .NET Core 6 with the new newest mongodb driver and I am inserting documents with many fields of around 2 MB per document. MongoDB is hosted in docker on the developer machine.

The following screenshot is from New Relic:

For each operation I make 3 inserts or updates:

  • (1) I write an event to the event stream collection. This collection has a very simple shape that can be seen in the following code snippet. Basically I serialize the object with `Newtonsoft.JSON` to a string and then I create an event object that is written to MongoDB. Therefore the document in the database has only very few fields.
  • (2) I write the same object from step (1) to two snapshot collections. The object is almost identical to step (1) and only a few meta fields are different, but the make no difference if you consider that the object size is 1.5 MB. In contrast to (1) there are probably thousands of fields.

    {
       "payload": "string"
       "..."
    }

    The observations

I made two observations:

1. I would expect the performance of (1) and (2) to be almost identical, because the input objects are almost identical, but (2) is almost 50% slower in all cases. I know, that the performance of the Newtonsoft.JSON serializer is better, I have made a few benchmarks in the past, but it does not explain why the difference is so big. I assume it has something to do with the number of fields.
2. The difference between the time of the driver side and Mongo side is huge. You could say that the overhead of the MongoDB driver is almost 200%.

I am using the mongo profiler to get information about the queries and I see the following results:

What you can see is the update statements for the snapshot collections sorted by milliseconds in descending order. Even the slowest example has a huge difference to the shown graph above, where the update on the client side takes around 67ms.

    1. Another test

I made another test. I insert large documents into the database and compare the performance of the C# side, with Mongo side. For comparison I also make other tests:

  • Insert CLR object
  • Insert Raw BsonDocument
  • Serialize JSON
  • Serialize BSON
  • HTTP Call, the server does not handle the response.

 

Serialized 00:00:00.0294601 - Raw 00:00:00.0319273 - Json 00:00:00.0068617 - Bson 00:00:00.0060091 - HTTP 00:00:00.0036443
Serialized 00:00:00.0277608 - Raw 00:00:00.0312567 - Json 00:00:00.0088210 - Bson 00:00:00.0043639 - HTTP 00:00:00.0037534
Serialized 00:00:00.0313045 - Raw 00:00:00.0319250 - Json 00:00:00.0191259 - Bson 00:00:00.0060418 - HTTP 00:00:00.0035886
Serialized 00:00:00.0298384 - Raw 00:00:00.0316654 - Json 00:00:00.0065837 - Bson 00:00:00.0047787 - HTTP 00:00:00.0036145
Serialized 00:00:00.0300794 - Raw 00:00:00.0310646 - Json 00:00:00.0072488 - Bson 00:00:00.0040601 - HTTP 00:00:00.0036597
Serialized 00:00:00.0300397 - Raw 00:00:00.0322210 - Json 00:00:00.0065505 - Bson 00:00:00.0054159 - HTTP 00:00:00.0036771
Serialized 00:00:00.0301598 - Raw 00:00:00.0314309 - Json 00:00:00.0073720 - Bson 00:00:00.0040479 - HTTP 00:00:00.0036394
Serialized 00:00:00.0296444 - Raw 00:00:00.0313114 - Json 00:00:00.0072129 - Bson 00:00:00.0072335 - HTTP 00:00:00.0044463
Serialized 00:00:00.0303547 - Raw 00:00:00.0338965 - Json 00:00:00.0064084 - Bson 00:00:00.0057766 - HTTP 00:00:00.0035770
Serialized 00:00:00.0292143 - Raw 00:00:00.0318035 - Json 00:00:00.0071610 - Bson 00:00:00.0042087 - HTTP 00:00:00.0035685

If you take the HTTP performance (3ms) and add the MongoDB time (3ms), then it should take 6ms, not 30ms.

The test  is very simple

 

public static class Program
{
    public static async Task Main()
    {
        var json = File.ReadAllText("input.json");
        var mongoClient = new MongoClient("mongodb://localhost");
        var mongoDatabase = mongoClient.GetDatabase("test");
        var serializedDocument = Newtonsoft.Json.JsonConvert.DeserializeObject<ComplexObject>(json)!;
        var serializedCollection = mongoDatabase.GetCollection<ComplexObject>("serialized");
        var rawDocument = BsonDocument.Parse(json)!;
        var rawCollection = mongoDatabase.GetCollection<BsonDocument>("raw");
        var httpClient = new HttpClient();
        for (var i = 0; i < 50; i++)
        {
            var serializedWatch = Stopwatch.StartNew();
            serializedDocument.Id = Guid.NewGuid().ToString();
            serializedCollection.InsertOne(serializedDocument);
            serializedWatch.Stop();
            var rawWatch = Stopwatch.StartNew();
            rawDocument["_id"] = Guid.NewGuid().ToString();
            rawCollection.InsertOne(rawDocument);
            rawWatch.Stop();
            var jsonWatch = Stopwatch.StartNew();
            Newtonsoft.Json.JsonConvert.SerializeObject(serializedDocument);
            jsonWatch.Stop();
            var bsonWatch = Stopwatch.StartNew();
            BsonSerializer.Serialize(new BsonDocumentWriter(new BsonDocument()), typeof(ComplexObject), serializedDocument);
            bsonWatch.Stop();
            var httpWatch = Stopwatch.StartNew();
            await httpClient.PostAsJsonAsync("http://localhost:5005", serializedDocument);
            httpWatch.Stop();
            Console.WriteLine("Serialized {0} - Raw {1} - Json {2} - Bson {3} - HTTP {4}", serializedWatch.Elapsed, rawWatch.Elapsed, jsonWatch.Elapsed, bsonWatch.Elapsed, httpWatch.Elapsed);
        }
    }
}

If you compare it with MongoDB you see a very big overhead (screenshot shows old result of test program)

My assumption was that it could have been caused by serialization, but it does not make a difference whether I use BsonDocument or a custom class.



 Comments   
Comment by PM Bot [ 20/Jul/22 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by Dmitry Lukyanov (Inactive) [ 05/Jul/22 ]

Thanks for your effort,

The MongoDB time is not correct and it does not include the full request handling.

I'm not sure which time exactly you mean, but if it's about what timings you see in server logs, then yes, it doesn't include time spent on preparation request on the client side. This time you can see in these events "ConnectionSentMessagesEvent"/"ConnectionReceivedMessageEvent" in fields like SerializationDuration/NetworkDuration and etc, see the gist mentioned above for details how to configure it.
Also, please pay attention as I mentioned above, a driver's operation (like InsertOne) may contain more steps that just preparing/sending/receiving requests that may explain why some operations takes extra time other than tcp request + serialization. To easier proceed here, I would recommend configuring events and provide timings calculated by drivers machinery.

Comment by Sebastian Stehle [ 05/Jul/22 ]

Hi Dimitry,

  1. Yes, but it does not explain the big difference, but yes, it is not really fair and on the mongodb side both updates are fast.
  2. I know, I just wanted to get an understanding where the time has went to.
  3. I know, I just wanted to get an comparison to understand what is actually fast and what not. If I would complain that MongoDB is slow and nobody else is faster it would be very unfair.

Let me explain my thoughts again:

  • MongoDB server reports around 3ms for the insert.
  • MongoDB client takes 30ms for the same operation.
  • If I serialize the document with to an object then it takes around 5ms.

So there are two possible explanations for me:

  1. The MongoDB time is not correct and it does not include the full request handling.
  2. The 22ms are spent for buffer management and the the socket management and so on. But it would be very slow in comparison to a simple HTTP request

 

I still don't understand what is going on here.

Comment by Dmitry Lukyanov (Inactive) [ 04/Jul/22 ]

Hey mail2stehle@gmail.com, thanks for your report

I made some initial research and this is what I found for now:
1. Am I right that initially you compared Insert operation with Update? If so, I don't think that it's fair blindly comparing it.
2. This logic

BsonSerializer.Serialize(new BsonDocumentWriter(new BsonDocument()), typeof(ComplexObject), serializedDocument);

doesn't represent what driver does during serialization since we serialize input entity directly into bytes. See attached gist and Serialize method to see driver's logic. This is why you don't see a difference here: "but it does not make a difference whether I use BsonDocument or a custom class."
3. I don't think that it's really reliable approach you used to gather statistic where you involved HttpClient and compare it with what driver does via a raw socket.
4. I think this is more correct observation about serialization performance serialization_performance_for_depth_0.txt , you can find more examples in the gist below.

where ToBson step makes just serialization step similar to what Newtonsoft does without driver specific logic and Full encoding represent timing that happens during all serialization steps. I can confirm that our serialization is a bit slower and we've already considered work to improve it in this ticket, we will consider your finding there.
5. Also, Insert operation can make more steps than just sending Insert, for example first time, it can establish connection that involves calling internal commands + configuring retries + enabling implicit sessions and etc. It also doesn't really fair blindly collect the duration for whole operation. But if you want to gather this information, you may look at events that we already have for this target: ConnectionSentMessagesEvent  and ConnectionReceivedMessageEvent , you may see how I used them in the provided gist in InsertWithLogging method.
Gist

Comment by Sebastian Stehle [ 22/Jun/22 ]

Hi James,

I don't think that the MongoDB driver is more configurable than Newtonsoft.JSON, but this is another story. For me the main difference in performance of the serializer are probably these two areas:

  1. Buffer Management: You have this very nice and complex buffer management, which has an overhead per field. If you serialize everything to memory first and then write it to MongoDB serializer in one go, you have this overhead only once instead of thousand of times.
  2. String conversion: Newtonsoft.JSON serializes to Unicode and then it makes the conversion to UTF8 instead of on a per field basis.

But this is not what I do not understand:

If you take this line: 

Serialized 00:00:00.0294601 - Raw 00:00:00.0319273 - Json 00:00:00.0068617 - Bson 00:00:00.0060091 - HTTP 00:00:00.003644

I compare the full insert time (first column) with BSON serializer performance (4th column). MongoDB reports 3ms in the profiler.

So you have 30ms - 6ms - 3ms = 21ms. What are these 21 ms spent on?

 

Comment by James Kovacs [ 21/Jun/22 ]

Hi, mail2stehle@gmail.com,

Thank you for filing this issue. Due to the complexities of serialization in our driver - which are much more configurable than Newtonsoft.JSON - there is going to be a certain amount of per-field overhead. I'm going to spend some time understanding your reproduction later this week to understand if there are areas where we can improve our performance in your scenario. I'll update this ticket with my findings and whether I require any additional input from you. I appreciate your patience as we investigate further.

Sincerely,
James

Generated at Wed Feb 07 21:47:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.