[CSHARP-2450] Performance: Reduced lock contention in BsonSerializer.LookupActualType Created: 11/Dec/18  Updated: 28/Oct/23  Resolved: 30/Mar/21

Status: Closed
Project: C# Driver
Component/s: Performance, Serialization
Affects Version/s: 2.7.2
Fix Version/s: 2.12.2, 2.13.0

Type: Improvement Priority: Major - P3
Reporter: Daniel Hegener Assignee: James Kovacs
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2019-02-21-17-39-35-659.png     PNG File image-2019-02-21-17-41-20-757.png    

 Description   

Performance profiling in our multithreaded environment seems to indicate that BsonSerializer.LookupActualType is on the hot path during deserialization. This is caused by the current reader-writer lock implementation. There is, however, an opportunity to reduce lock contention using a non-generic Hashtable which is safe for concurrent reads as per https://docs.microsoft.com/en-us/dotnet/api/system.collections.hashtable?redirectedfrom=MSDN&view=netframework-4.7.2#thread-safety.



 Comments   
Comment by Githook User [ 30/Mar/21 ]

Author:

{'name': 'James Kovacs', 'email': 'jkovacs@post.harvard.edu', 'username': 'JamesKovacs'}

Message: CSHARP-2450: Improved deserialization performance by switching from HashSet<T> protected by a ReaderWriterLockSlim to a ConcurrentDictionary<K,V> outside the ReaderWriterLockSlim. (#482)
Branch: v2.12.x
https://github.com/mongodb/mongo-csharp-driver/commit/ec2dfa98879c4da332c48047575632459751b135

Comment by James Kovacs [ 30/Mar/21 ]

dnickless On second thought... I've closed this case so we can track fix version as part of our release process. I've forked a new ticket CSHARP-3513 to continue work on LookupActualType improvements in PR #434.

Comment by James Kovacs [ 29/Mar/21 ]

PR #482 has been incorporated into master (see above). We will leave this ticket open as PR #434 is rebased and performance re-evaluated in light of the PR #482 changes.

Comment by Githook User [ 29/Mar/21 ]

Author:

{'name': 'James Kovacs', 'email': 'jkovacs@post.harvard.edu', 'username': 'JamesKovacs'}

Message: CSHARP-2450: Improved deserialization performance by switching from HashSet<T> protected by a ReaderWriterLockSlim to a ConcurrentDictionary<K,V> outside the ReaderWriterLockSlim. (#482)
Branch: master
https://github.com/mongodb/mongo-csharp-driver/commit/2d021c58e972860bd1d781c2694af55ceb8552f9

Comment by Daniel Hegener [ 06/Jan/21 ]

I also created https://github.com/mongodb/mongo-csharp-driver/pull/434 which tackles another performance bottleneck that sits in the same area of the code (LookupActualType) but also impacts serialization.

Comment by Daniel Hegener [ 06/Jan/21 ]

Thanks, @boris.dogadov, I understand your concerns and also that this is a risky area of the code. However, kindly note that the

IsTypeDiscriminated(Type type)

method currently does not synchronize access to the __discriminatedTypes field. And this is the only place where that field gets read... So, really, either there already is a potential race condition in the current code or the fact that both fields (__discriminatedTypes and __discriminators) are changed inside the same lock is more of a coincidence than evidence for a dependency between the two.

Anyway, since this topic really is a big deal for us, I gave it a second shot: https://github.com/mongodb/mongo-csharp-driver/pull/433

This time, I didn't totally eliminate read locks in LookupActualType but at least shortened the locked section to a bare minimum. I haven't changed a lot really just made sure that the HashSet<Type> collections inside the __discriminators dictionary are kind of immutable.

Also, I have added the probably missing read lock inside the above-mentioned IsTypeDiscriminated method.

Comment by Boris Dogadov [ 28/Dec/20 ]

Hi daniel.hegener@gmx.net

Currently there is a global assumption that all serialization configuration is protected by single synchronization primitive. Making any path lock-free would require careful inspection of all possible interleavings of read/writes of other configurations parts.

One of the examples would be: __discriminatedTypes and __discriminators are expected to be always in a consistent state with each other (modified under the same lock). Making LookupActualType look-free, might result in an inconsistent observed state where __discriminators is updated and __discriminatedTypes still not.

Comment by Daniel Hegener [ 19/Dec/20 ]

Would you mind elaborating a bit more on your findings, please? Based on my profiling back then (screenshot in the PR) and the benchmark I put on gist which yielded the results described in my other comment above, the improvements appeared rather striking and absolutely worth that change which - IIRC after all the years - was a small and safe thing, just not super pretty because of the moon-generic hashset... I'm happy to try harder again if you tell me what's not yet good in the suggested implementation.
We use your driver's serialization facilities in isolation, too (kind of like a JSON library rather than like a pure driver), so not only in conjunction with writing to the database but also e.g. when serializing in-memory instances out to ReST clients. All of that happens in heavily parallel scenarios which is why this is so important for us.

Comment by Boris Dogadov [ 18/Dec/20 ]

Hi dnickless, thanks for bringing up this issue and submitting the PR.
We have investigated the proposed change and other alternatives. We have found that the required change cannot be limited to BsonSerializer.LookupActualType only, and requires refactoring of the whole synchronization approach at both BsonSerializer and BsonClassMap. After benchmarking a version of BsonSerializer and BsonClassMap with lock-free reads, we have decided that the performance gain does not justify the overall refactoring and potential consequences.

Comment by Daniel Hegener [ 21/Feb/19 ]

Hi Ian

I am sorry for not getting back any earlier but I've been pretty darn busy recently. Anyway, I have interesting news:

I borrowed some fairly realistic test classes from https://csharp2json.io/. Using those, I hacked together the following "unit test": https://gist.github.com/dnickless/334cff0490e788759918c341f78563fa

It's really not sophisticated in any way but it certainly does some multithreaded deserialization of 10m documents.

On my workstation (2 sockets x 6 cores x Hyperthreading = 24 Cores), I see the following results:

BEFORE:

[02/21/2019 17:15:46 Informational] ------ Run test started ------
[02/21/2019 17:15:46 Informational] [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (32-bit Desktop .NET 4.0.30319.42000)
[02/21/2019 17:15:47 Informational] [xUnit.net 00:00:00.96] Starting: MongoDB.Bson.Tests
[02/21/2019 17:16:38 Informational] [xUnit.net 00:00:52.21] Finished: MongoDB.Bson.Tests
[02/21/2019 17:16:38 Informational] ========== Run test finished: 1 run (0:00:52.695014) ==========
[02/21/2019 17:16:45 Informational] ------ Run test started ------
[02/21/2019 17:16:45 Informational] [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (32-bit Desktop .NET 4.0.30319.42000)
[02/21/2019 17:16:46 Informational] [xUnit.net 00:00:01.04] Starting: MongoDB.Bson.Tests
[02/21/2019 17:17:35 Informational] [xUnit.net 00:00:49.87] Finished: MongoDB.Bson.Tests
[02/21/2019 17:17:35 Informational] ========== Run test finished: 1 run (0:00:50.3388792) ==========
[02/21/2019 17:17:42 Informational] ------ Run test started ------
[02/21/2019 17:17:42 Informational] [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (32-bit Desktop .NET 4.0.30319.42000)
[02/21/2019 17:17:43 Informational] [xUnit.net 00:00:00.91] Starting: MongoDB.Bson.Tests
[02/21/2019 17:18:35 Informational] [xUnit.net 00:00:52.72] Finished: MongoDB.Bson.Tests
[02/21/2019 17:18:35 Informational] ========== Run test finished: 1 run (0:00:53.2820475) ==========

AFTER:

[02/21/2019 17:31:24 Informational] ------ Run test started ------
[02/21/2019 17:31:24 Informational] [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (32-bit Desktop .NET 4.0.30319.42000)
[02/21/2019 17:31:25 Informational] [xUnit.net 00:00:00.98] Starting: MongoDB.Bson.Tests
[02/21/2019 17:31:55 Informational] [xUnit.net 00:00:31.15] Finished: MongoDB.Bson.Tests
[02/21/2019 17:31:55 Informational] ========== Run test finished: 1 run (0:00:31.6398097) ==========
[02/21/2019 17:32:03 Informational] ------ Run test started ------
[02/21/2019 17:32:03 Informational] [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (32-bit Desktop .NET 4.0.30319.42000)
[02/21/2019 17:32:04 Informational] [xUnit.net 00:00:00.91] Starting: MongoDB.Bson.Tests
[02/21/2019 17:32:35 Informational] [xUnit.net 00:00:31.76] Finished: MongoDB.Bson.Tests
[02/21/2019 17:32:35 Informational] ========== Run test finished: 1 run (0:00:32.2108424) ==========
[02/21/2019 17:32:46 Informational] ------ Run test started ------
[02/21/2019 17:32:47 Informational] [xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (32-bit Desktop .NET 4.0.30319.42000)
[02/21/2019 17:32:48 Informational] [xUnit.net 00:00:00.98] Starting: MongoDB.Bson.Tests
[02/21/2019 17:33:18 Informational] [xUnit.net 00:00:31.35] Finished: MongoDB.Bson.Tests
[02/21/2019 17:33:18 Informational] ========== Run test finished: 1 run (0:00:31.7638168) ==========

I haven't tried using less cores or no HT or anything around .NET Core or other bitness etc... Nonetheless, assuming I made no mistake in continuing to guarantee thread-safe writes, this would appear to be a massive improvement.

Comment by Ian Whalen (Inactive) [ 17/Dec/18 ]

dnickless what improvements are you seeing in your PR in both the profiler and actual throughput?

Comment by Ian Whalen (Inactive) [ 17/Dec/18 ]

https://github.com/mongodb/mongo-csharp-driver/pull/347

Generated at Wed Feb 07 21:42:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.