[CSHARP-3426] Replicaset failover leads to a batch of errors even with retryable writes on Created: 15/Feb/21  Updated: 19/Feb/21  Resolved: 18/Feb/21

Status: Closed
Project: C# Driver
Component/s: Connectivity
Affects Version/s: 2.11.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Aristarkh Zagorodnikov Assignee: James Kovacs
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

.NET Framework 4.8 on Windows x64
MongoDB 4.2.6 on Linux x64



 Description   

Hi,

The replica set member got restarted, became secondary, and then a batch of requests (with read preference that allowed secondary reads) fail with MongoCommandException (details below). I'm not sure why it happened but it looks like the server is rejecting cached credentials that are not valid after reconnection due to some (I'm speculating here) cache not being cleared after the connection is lost/broken. Unfortunately, this defeats the purpose of retryable writes since even a graceful failover can lead to a query failure with a non-retryable error. This happens almost every time when secondary reads are allowed and the primary node is restarted, becoming a secondary one.
This also could be a server-side issue (given what JIRA search shows up for related errors), but I doubt it.

MongoDB.Driver.MongoCommandException: Command find failed: Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1613369050, 99) } with id: 6893426160802201667.
TCommandResult MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol<TCommandResult>.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
TCommandResult MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol<TCommandResult>.Execute(IConnection connection, CancellationToken cancellationToken)
TCommandResult MongoDB.Driver.Core.WireProtocol.CommandWireProtocol<TCommandResult>.Execute(IConnection connection, CancellationToken cancellationToken)
TResult MongoDB.Driver.Core.Servers.Server+ServerChannel.ExecuteProtocol<TResult>(IWireProtocol<TResult> protocol, ICoreSession session, CancellationToken cancellationToken)
TResult MongoDB.Driver.Core.Servers.Server+ServerChannel.Command<TResult>(ICoreSession session, ReadPreference readPreference, DatabaseNamespace databaseNamespace, BsonDocument command, IEnumerable<Type1CommandMessageSection> commandPayloads, IElementNameValidator commandValidator, BsonDocument additionalOptions, Action<IMessageEncoderPostProcessor> postWriteAction, CommandResponseHandling responseHandling, IBsonSerializer<TResult> resultSerializer, MessageEncoderSettings messageEncoderSettings, CancellationToken cancellationToken)
TCommandResult MongoDB.Driver.Core.Operations.CommandOperationBase<TCommandResult>.ExecuteProtocol(IChannelHandle channel, ICoreSessionHandle session, ReadPreference readPreference, CancellationToken cancellationToken)
TCommandResult MongoDB.Driver.Core.Operations.ReadCommandOperation<TCommandResult>.ExecuteAttempt(RetryableReadContext context, int attempt, long? transactionNumber, CancellationToken cancellationToken)
TResult MongoDB.Driver.Core.Operations.RetryableReadOperationExecutor.Execute<TResult>(IRetryableReadOperation<TResult> operation, RetryableReadContext context, CancellationToken cancellationToken)
TCommandResult MongoDB.Driver.Core.Operations.ReadCommandOperation<TCommandResult>.Execute(RetryableReadContext context, CancellationToken cancellationToken)
IAsyncCursor<TDocument> MongoDB.Driver.Core.Operations.FindCommandOperation<TDocument>.Execute(RetryableReadContext context, CancellationToken cancellationToken)
IAsyncCursor<TDocument> MongoDB.Driver.Core.Operations.FindOperation<TDocument>.Execute(RetryableReadContext context, CancellationToken cancellationToken)
IAsyncCursor<TDocument> MongoDB.Driver.Core.Operations.FindOperation<TDocument>.Execute(IReadBinding binding, CancellationToken cancellationToken)
TResult MongoDB.Driver.OperationExecutor.ExecuteReadOperation<TResult>(IReadBinding binding, IReadOperation<TResult> operation, CancellationToken cancellationToken)
TResult MongoDB.Driver.MongoCollectionImpl<TDocument>.ExecuteReadOperation<TResult>(IClientSessionHandle session, IReadOperation<TResult> operation, ReadPreference readPreference, CancellationToken cancellationToken)
TResult MongoDB.Driver.MongoCollectionImpl<TDocument>.ExecuteReadOperation<TResult>(IClientSessionHandle session, IReadOperation<TResult> operation, CancellationToken cancellationToken)
IAsyncCursor<TProjection> MongoDB.Driver.MongoCollectionImpl<TDocument>.FindSync<TProjection>(IClientSessionHandle session, FilterDefinition<TDocument> filter, FindOptions<TDocument, TProjection> options, CancellationToken cancellationToken)
IAsyncCursor<TProjection> MongoDB.Driver.MongoCollectionImpl<TDocument>.FindSync<TProjection>(IClientSessionHandle session, FilterDefinition<TDocument> filter, FindOptions<TDocument, TProjection> options, CancellationToken cancellationToken)+(IClientSessionHandle session) => { }
TResult MongoDB.Driver.MongoCollectionImpl<TDocument>.UsingImplicitSession<TResult>(Func<IClientSessionHandle, TResult> func, CancellationToken cancellationToken)
IAsyncCursor<TProjection> MongoDB.Driver.MongoCollectionImpl<TDocument>.FindSync<TProjection>(FilterDefinition<TDocument> filter, FindOptions<TDocument, TProjection> options, CancellationToken cancellationToken)
IAsyncCursor<TProjection> Dnet.Mongo.Fx.MongoCollectionDecorator<TDocument>.FindSyncImpl<TProjection>(FilterDefinition<TDocument> filter, FindOptions<TDocument, TProjection> options, CancellationToken cancellationToken) in …/lib/Dnet.Mongo/Fx/Decorators/MongoCollectionDecorator`1.cs:line 893
IAsyncCursor<TProjection> Dnet.Mongo.Fx.MongoCollectionDecorator<TDocument>.FindSync<TProjection>(FilterDefinition<TDocument> filter, FindOptions<TDocument, TProjection> options, CancellationToken cancellationToken)+() => { } in …/lib/Dnet.Mongo/Fx/Decorators/MongoCollectionDecorator`1.cs:line 865
!Trace.Action
IAsyncCursor<TProjection> Dnet.Mongo.Fx.MongoCollectionDecorator<TDocument>.FindSync<TProjection>(FilterDefinition<TDocument> filter, FindOptions<TDocument, TProjection> options, CancellationToken cancellationToken) in …/lib/Dnet.Mongo/Fx/Decorators/MongoCollectionDecorator`1.cs:line 863
IAsyncCursor<TProjection> MongoDB.Driver.FindFluent<TDocument, TProjection>.ToCursor(CancellationToken cancellationToken)

Thanks for looking at this =)



 Comments   
Comment by Aristarkh Zagorodnikov [ 19/Feb/21 ]

James, thank you very much for the detailed response.

Comment by James Kovacs [ 18/Feb/21 ]

Hi, onyxmaster,

We have investigated this issue and it originates from the server. Here is the relevant portion of the error message:

Cache Reader No keys found for HMAC that is valid for time: { ts: Timestamp(1613369050, 99) } with id: 6893426160802201667

This server error indicates that the mongod received a command from the .NET/C# driver with a $clusterTime signed using key 6893426160802201667. ($clusterTime was introduced in MongoDB 3.6 to implement causally consistent sessions and provides a cluster-wide, monotonically-increasing Lampert clock.) The mongod attempted to look up the signing key in admin.system.keys but could not find it resulting in this error.

The keys in admin.system.keys should be synchronized automatically by your cluster. Signing keys are generated 90 days in advance and are never deleted. When signing the $clusterTime, the earliest valid signing key is used. As well the .NET/C# driver simply echoes the highest $clusterTime observed back to the cluster nodes with each command. All indications are that this is a server-related issue and not a driver-related one.

To investigate this server issue, you have a variety of support options:

  • Our MongoDB support portal, located at support.mongodb.com.
  • Our MongoDB community portal, located here.
  • If you are an Atlas customer, you can review your support options by clicking Support in the top menu bar of the Atlas UI.

Sincerely,
James

Comment by Esha Bhargava [ 18/Feb/21 ]

onyxmaster Thank you for reporting this issue! We'll look into it and get back to you soon.

Comment by Aristarkh Zagorodnikov [ 15/Feb/21 ]

s/defeats the purpose of retryable writes/defeats the purpose of retryable ops/

Generated at Wed Feb 07 21:45:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.