[CSHARP-3230] Improve Serialization Created: 22/Oct/20  Updated: 07/Feb/24

Status: Backlog
Project: C# Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Epic Priority: Major - P3
Reporter: Rachelle Palmer Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: rp-track, size-large
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented

 Description   
Epic Summary

Summary
1) Assess all our currently open serialization items
2) Close out old, gone away, or won't fix items
3) Create a rollup of the seralization asks that we will/should fix and attach them to this epic

Motivation
We have many users who have opened serialization bugs or feature requests over the years, and our backlog has grown fairly unwieldy as a result. Some of these asks are reasonable and some are not, and we need to dedicate a few weeks to reviewing them all and decisioning each ticket.

Cast of Characters

Engineering Lead: James Kovacs
Product Owner: Rachelle Palmer
Program Manager: esha.bhargava

Documentation

[Scope Document|some.url]
[Technical Design Document|some.url]



 Comments   
Comment by Deanna Delapasse [ 07/Feb/24 ]

Absolutely!   I'll have someone confirm that we were doing a proper comparison and then create a ticket with a sample.  We still have the test data we shared with the consultant and his test app and final report.   He did reach out to someone in the c# driver group during the session and was told there were some planned improvements to remove reflection in the driver.   Maybe that is another ticket?  If so, please let me know so we can watch for that as well.

My mongo dev is out sick today, but will get on this soon.

Comment by James Kovacs [ 07/Feb/24 ]

Hi, ddelapasse@oceaneering.com,

Thank you for reaching out to us about your concerns around serialization performance in the .NET/C# Driver. We have investigated/prototyped various serialization performance improvements in the past year or two, but they all came with tradeoffs and potentially breaking changes. We would like to better understand your use case so that we can propose meaningful improvements - both in your code and the driver itself. I would suggest you open a support ticket referencing this issue so that we can investigate further together.

Something I would like to note... You mentioned that you were comparing the deserialization performance of the .NET/C# Driver to mongosh. When using mongosh, a query is executed and an initial 16MB batch is deserialized with additional 16MB batches being fetched and deserialized on demand during iteration. Although you can do the same thing in C# using query.ToCursor(), it is more typical to call query.ToList(), which retrieves and deserializes all documents at once. To ensure that you are comparing apples to apples, I would suggest iterating the entire cursor in mongosh and doing the same in your C# code:

For mongosh:

var start = new Date();
var cursor = db.coll.aggregate(<<PIPELINE>>);
cursor.forEach(printjson);
var end = new Date();
console.log((end - start) + "ms");

For C#:

var stopwatch = Stopwatch.StartNew();
var query = coll.Aggregate(<<PIPELINE>>);
var results = query.ToList();
stopwatch.Stop();
Console.WriteLine($"{stopwatch.ElapsedMilliseconds}ms");

By iterating the entire cursor and deserializing all results, this will provide a more representative comparison of mongosh (which uses the Node.js Driver internally) and the .NET/C# Driver.

Sincerely,
James

Comment by Deanna Delapasse [ 07/Feb/24 ]

This issue is severely impacting my team's project.  We just paid $10k for a consultant to help us and the poor performance of the driver's C# deserialization was found to be the root cause of all of our "pain points".  I see that many of the epic's tickets are already closed.  Could you please release what you have in the hopes that it will offer some improvement?

We're reducing our projections when possible and minimizing our objects, but still giving us a very disappointing experience when compared to the pure query in the console (ie deserialization is 10x slower than the query itself).

Comment by PM Bot [ 02/Feb/22 ]

If you are not logged in, you can view the tickets in this epic by following this link.

Generated at Wed Feb 07 21:44:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.