[CSHARP-3426] Replicaset failover leads to a batch of errors even with retryable writes on Created: 15/Feb/21 Updated: 19/Feb/21 Resolved: 18/Feb/21 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Connectivity |
| Affects Version/s: | 2.11.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | James Kovacs |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
.NET Framework 4.8 on Windows x64 |
||
| Description |
|
Hi, The replica set member got restarted, became secondary, and then a batch of requests (with read preference that allowed secondary reads) fail with MongoCommandException (details below). I'm not sure why it happened but it looks like the server is rejecting cached credentials that are not valid after reconnection due to some (I'm speculating here) cache not being cleared after the connection is lost/broken. Unfortunately, this defeats the purpose of retryable writes since even a graceful failover can lead to a query failure with a non-retryable error. This happens almost every time when secondary reads are allowed and the primary node is restarted, becoming a secondary one.
Thanks for looking at this =) |
| Comments |
| Comment by Aristarkh Zagorodnikov [ 19/Feb/21 ] | |
|
James, thank you very much for the detailed response. | |
| Comment by James Kovacs [ 18/Feb/21 ] | |
|
Hi, onyxmaster, We have investigated this issue and it originates from the server. Here is the relevant portion of the error message:
This server error indicates that the mongod received a command from the .NET/C# driver with a $clusterTime signed using key 6893426160802201667. ($clusterTime was introduced in MongoDB 3.6 to implement causally consistent sessions and provides a cluster-wide, monotonically-increasing Lampert clock.) The mongod attempted to look up the signing key in admin.system.keys but could not find it resulting in this error. The keys in admin.system.keys should be synchronized automatically by your cluster. Signing keys are generated 90 days in advance and are never deleted. When signing the $clusterTime, the earliest valid signing key is used. As well the .NET/C# driver simply echoes the highest $clusterTime observed back to the cluster nodes with each command. All indications are that this is a server-related issue and not a driver-related one. To investigate this server issue, you have a variety of support options:
Sincerely, | |
| Comment by Esha Bhargava [ 18/Feb/21 ] | |
|
onyxmaster Thank you for reporting this issue! We'll look into it and get back to you soon. | |
| Comment by Aristarkh Zagorodnikov [ 15/Feb/21 ] | |
|
s/defeats the purpose of retryable writes/defeats the purpose of retryable ops/ |