Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major - P3
-
None
-
4.4.5
-
None
-
ALL
Description
MongoDB version : v4.4.5
MongoDB Java Driver Version: mongodb-driver-sync 4.1.2
A transaction T1 failed to commit due to the WritableServerSelector-Timeout exception. However, another transaction T2 read the updates of T1. I use the withTransaction API and T1 retries for many times.
The exception T1 throws are:
2022-08-25 00:35:04,237{GMT} WARN [jepsen worker 46] jepsen.mongodb.rw-register: #error { |
:cause Timed out after 1000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=public-cd-a5.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.4 ms, state=CONNECTED}, {address=public-cd-a1.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=156.8 ms, state=CONNECTED}, {address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=38.3 ms, state=CONNECTED}, {address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.9 ms, state=CONNECTED}, {address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.1 ms, state=CONNECTED}] |
:via
|
[{:type com.mongodb.MongoTimeoutException
|
:message Timed out after 1000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=public-cd-a5.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.4 ms, state=CONNECTED}, {address=public-cd-a1.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=156.8 ms, state=CONNECTED}, {address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=38.3 ms, state=CONNECTED}, {address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.9 ms, state=CONNECTED}, {address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.1 ms, state=CONNECTED}] |
:at [com.mongodb.internal.connection.BaseCluster createTimeoutException BaseCluster.java 407]}] |
:trace
|
[[com.mongodb.internal.connection.BaseCluster createTimeoutException BaseCluster.java 407] |
[com.mongodb.internal.connection.BaseCluster selectServer BaseCluster.java 118] |
[com.mongodb.internal.connection.AbstractMultiServerCluster selectServer AbstractMultiServerCluster.java 52] |
[com.mongodb.internal.binding.ClusterBinding$ClusterBindingConnectionSource <init> ClusterBinding.java 103] |
[com.mongodb.internal.binding.ClusterBinding$ClusterBindingConnectionSource <init> ClusterBinding.java 99] |
[com.mongodb.internal.binding.ClusterBinding getWriteConnectionSource ClusterBinding.java 91] |
[com.mongodb.client.internal.ClientSessionBinding getWriteConnectionSource ClientSessionBinding.java 93] |
[com.mongodb.internal.operation.OperationHelper withReleasableConnection OperationHelper.java 618] |
[com.mongodb.internal.operation.CommandOperationHelper executeRetryableCommand CommandOperationHelper.java 705] |
[com.mongodb.internal.operation.TransactionOperation execute TransactionOperation.java 69] |
[com.mongodb.internal.operation.CommitTransactionOperation execute CommitTransactionOperation.java 133] |
[com.mongodb.internal.operation.CommitTransactionOperation execute CommitTransactionOperation.java 54] |
[com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor execute MongoClientDelegate.java 195] |
[com.mongodb.client.internal.ClientSessionImpl commitTransaction ClientSessionImpl.java 129] |
[com.mongodb.client.internal.ClientSessionImpl withTransaction ClientSessionImpl.java 210] |
[jdk.internal.reflect.GeneratedMethodAccessor29 invoke nil -1] |
[jdk.internal.reflect.DelegatingMethodAccessorImpl invoke DelegatingMethodAccessorImpl.java 43] |
[java.lang.reflect.Method invoke Method.java 566] |
[clojure.lang.Reflector invokeMatchingMethod Reflector.java 167] |
[clojure.lang.Reflector invokeInstanceMethod Reflector.java 102] |
[jepsen.mongodb.rw_register.Client$fn__2566 invoke rw_register.clj 137] |
[jepsen.mongodb.rw_register.Client invoke_BANG_ rw_register.clj 128] |
[jepsen.client.Validate invoke_BANG_ client.clj 83] |
[jepsen.generator.interpreter.ClientWorker invoke_BANG_ interpreter.clj 62] |
[jepsen.generator.interpreter$spawn_worker$fn__9422$fn__9423 invoke interpreter.clj 137] |
[jepsen.generator.interpreter$spawn_worker$fn__9422 invoke interpreter.clj 120] |
[clojure.core$binding_conveyor_fn$fn__5739 invoke core.clj 2030] |
[clojure.lang.AFn call AFn.java 18] |
[java.util.concurrent.FutureTask run FutureTask.java 264] |
[java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1128] |
[java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 628] |
[java.lang.Thread run Thread.java 829]]} |
The core message is
Client view of cluster state is {type=REPLICA_SET, servers=[
|
{address=public-cd-a5.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.4 ms, state=CONNECTED}, |
{address=public-cd-a1.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=156.8 ms, state=CONNECTED}, |
{address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=38.3 ms, state=CONNECTED}, |
{address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.9 ms, state=CONNECTED}, |
{address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.1 ms, state=CONNECTED} |
]
|
I have a mongo replica-set cluster consisting of 5 nodes. The primary is public-cd-a1.disalg.cn:37017. However, in this exception, they are all secondaries in the client's view. I have checked the mongod.log and there are no network issues or leader re-election. Node a1 is always the primary.
Moreover, there are other types of nodes in the client's view except REPLICA_SET_SECONDARY. In this exception, I found REPLICA_SET_OTHER and UNKNOWN.
:cause Timed out after 1000 ms while waiting for a server that matches WritableServerSelector. |
Client view of cluster state is {type=REPLICA_SET, servers=[
|
{address=public-cd-a5.disalg.cn:37017, type=, roundTripTime=37.0 ms, state=CONNECTED}, |
{address=public-cd-a1.disalg.cn:37017, type=UNKNOWN, state=CONNECTING}, |
{address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=822.4 ms, state=CONNECTED}, |
{address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.1 ms, state=CONNECTED}, |
{address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=195.9 ms, state=CONNECTED}] |
This issue makes us can not determine whether a transaction is successfully committed. Why does the client see a wrong view of the cluster state?