[CSHARP-709] Implement compilable serializer Created: 22/Mar/13  Updated: 08/Jun/23

Status: Backlog
Project: C# Driver
Component/s: Performance
Affects Version/s: 1.8
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Vladimir Perevalov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

All


Attachments: Text File Program.cs    
Issue Links:
Related

 Description   

C# drivers docs, claim that bson serialization is working very much like XmlSerializer in .Net.
But XmlSerializer uses code generation, which allows to perform direct operations on entities and is quite fast.
On the other hand, BsonSerializer uses reflection to read/write properties.
I did some testing, on a simple object with ~10 properties. Implementing BSON serialization manually (hardcoded work with BsonReader) is 80-100% faster, then current BSON serializer.
So, we could have almost twice faster deserialization.
If anyone interested, I can attach my test source code.

On my Core i7 box in one thread/release, I get about 60kOps with current implementation and 120kOps with my hardcoded usage of BsonBuffer.



 Comments   
Comment by Chad Kreimendahl [ 19/Jan/16 ]

I'm curious if this is still relevant.

We do quite a bit in our code to squeeze every last drop of performance out, and are basically down to serialization as the primary driver of our cpu related delays. Our documents are fairly dynamic, though, so I'm not sure how much better it could get.

I'm curious if the serializer was adopted from other code out there (newtonsoft, etc), and if it had been updated with any of the performance related items that have come out since the last comment here (nearly 3 years ago)

Comment by Vladimir Perevalov [ 22/Mar/13 ]

Craig, with attached implementation, on my machine .Net 4.5 Release ANY CPU, I get following output:
BsonSerializer 00:00:04.2499858, avg: 117647 ops/sec
DeserializeManual 00:00:02.8258560, avg: 176937 ops/sec

This looks more like 50%. But it depends greatly on the document. When I added BsonBinary with 10000 byte array, difference became larger. I think, general rule here: more properties in the document = more speed difference.
I agree, that from one point serialization is already reasonably fast, since it works faster than network. But consider following:

1. In real world app, there are lots of other things going on. Not only mongo deserialization. And when I do really lots mongodb work, deserialization sometimes takes 10-30% of CPU load.
I would really like, my app to do something more useful that time

2. When I did tests on 1Gbps ethernet, with similar documents, I could achieve 70-80kOps reading. Yes, I saturated network for 100%, but I also had to run my test program in 8 threads, to saturate Core i7 CPU (CPU usage was 10-20%). On a 10Gb network and mongoDB on SSD with lots of RAM, I can see a bottleneck in serialization.

3. I know about serialization system being extensible. Do you have any reports, how many people really use it? At least, I don't have use cases where I need to customize serialization. So, you could, for example, implement an option to choose: either you have extensible serialization with reflection, or not extensible but with compiled serializer, which is considerably faster.
As far as I understand, this could be even just another custom serializer. And it will invoked only for types marked with attribute. Then nothing will break at all.
I am actually considering, writing some simple implementation.

4. As for IL code. I was considering using il code emmiter, but I think it will be much easier to diagnose and debug (and write) serializer, if it will generate normal C# code and call compiler. It is slower, but compilation is done only once. It is easy to allow users to preheat serializers, so there is no lag on the first query.

Comment by Vladimir Perevalov [ 22/Mar/13 ]

Performance test sample

Comment by Craig Wilson [ 22/Mar/13 ]

Thanks for the ideas. We've played with this and for simple cases this isn't all that difficult, but can quickly become very complex due to edge cases. Because of our extensible IBsonSerializer architecture, to get the most advantage out of your suggestion, we would need to have every IBsonSerializer be able to runtime generate ilcode for it's specific purpose. This would be a major overhaul of the entire Bson layer.

One question I'm interested in is this: By our measurements, we can already serialize/deserialize faster than the network can carry the bytes. Hence, improving our serialization speed doesn't seem like it gains all that much unless you aren't using the network at all.

I'd love to see your test code because I'm surprised by the 2x speed improvement. We do use reflection, but we only use it once. When we it is used, it is used to runtime generate ilcode for the speed improvements you've already mentioned.

Thanks again and please comment back to continue the discussion.

Generated at Wed Feb 07 21:37:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.