Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-69177

A bug of WritableServerSelector-Timeout transaction committed

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.5
    • Component/s: None
    • ALL

       
      MongoDB version : v4.4.5
      MongoDB Java Driver Version: mongodb-driver-sync 4.1.2
       
      A transaction T1 failed to commit due to the WritableServerSelector-Timeout exception. However, another transaction T2 read the updates of T1. I use the withTransaction API and T1 retries for many times.
       
      The exception T1 throws are:

      2022-08-25 00:35:04,237{GMT}    WARN    [jepsen worker 46] jepsen.mongodb.rw-register: #error {
       :cause Timed out after 1000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=public-cd-a5.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.4 ms, state=CONNECTED}, {address=public-cd-a1.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=156.8 ms, state=CONNECTED}, {address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=38.3 ms, state=CONNECTED}, {address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.9 ms, state=CONNECTED}, {address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.1 ms, state=CONNECTED}]
       :via
       [{:type com.mongodb.MongoTimeoutException
         :message Timed out after 1000 ms while waiting for a server that matches WritableServerSelector. Client view of cluster state is {type=REPLICA_SET, servers=[{address=public-cd-a5.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.4 ms, state=CONNECTED}, {address=public-cd-a1.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=156.8 ms, state=CONNECTED}, {address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=38.3 ms, state=CONNECTED}, {address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.9 ms, state=CONNECTED}, {address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.1 ms, state=CONNECTED}]
         :at [com.mongodb.internal.connection.BaseCluster createTimeoutException BaseCluster.java 407]}]
       :trace
       [[com.mongodb.internal.connection.BaseCluster createTimeoutException BaseCluster.java 407]
        [com.mongodb.internal.connection.BaseCluster selectServer BaseCluster.java 118]
        [com.mongodb.internal.connection.AbstractMultiServerCluster selectServer AbstractMultiServerCluster.java 52]
        [com.mongodb.internal.binding.ClusterBinding$ClusterBindingConnectionSource <init> ClusterBinding.java 103]
        [com.mongodb.internal.binding.ClusterBinding$ClusterBindingConnectionSource <init> ClusterBinding.java 99]
        [com.mongodb.internal.binding.ClusterBinding getWriteConnectionSource ClusterBinding.java 91]
        [com.mongodb.client.internal.ClientSessionBinding getWriteConnectionSource ClientSessionBinding.java 93]
        [com.mongodb.internal.operation.OperationHelper withReleasableConnection OperationHelper.java 618]
        [com.mongodb.internal.operation.CommandOperationHelper executeRetryableCommand CommandOperationHelper.java 705]
        [com.mongodb.internal.operation.TransactionOperation execute TransactionOperation.java 69]
        [com.mongodb.internal.operation.CommitTransactionOperation execute CommitTransactionOperation.java 133]
        [com.mongodb.internal.operation.CommitTransactionOperation execute CommitTransactionOperation.java 54]
        [com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor execute MongoClientDelegate.java 195]
        [com.mongodb.client.internal.ClientSessionImpl commitTransaction ClientSessionImpl.java 129]
        [com.mongodb.client.internal.ClientSessionImpl withTransaction ClientSessionImpl.java 210]
        [jdk.internal.reflect.GeneratedMethodAccessor29 invoke nil -1]
        [jdk.internal.reflect.DelegatingMethodAccessorImpl invoke DelegatingMethodAccessorImpl.java 43]
        [java.lang.reflect.Method invoke Method.java 566]
        [clojure.lang.Reflector invokeMatchingMethod Reflector.java 167]
        [clojure.lang.Reflector invokeInstanceMethod Reflector.java 102]
        [jepsen.mongodb.rw_register.Client$fn__2566 invoke rw_register.clj 137]
        [jepsen.mongodb.rw_register.Client invoke_BANG_ rw_register.clj 128]
        [jepsen.client.Validate invoke_BANG_ client.clj 83]
        [jepsen.generator.interpreter.ClientWorker invoke_BANG_ interpreter.clj 62]
        [jepsen.generator.interpreter$spawn_worker$fn__9422$fn__9423 invoke interpreter.clj 137]
        [jepsen.generator.interpreter$spawn_worker$fn__9422 invoke interpreter.clj 120]
        [clojure.core$binding_conveyor_fn$fn__5739 invoke core.clj 2030]
        [clojure.lang.AFn call AFn.java 18]
        [java.util.concurrent.FutureTask run FutureTask.java 264]
        [java.util.concurrent.ThreadPoolExecutor runWorker ThreadPoolExecutor.java 1128]
        [java.util.concurrent.ThreadPoolExecutor$Worker run ThreadPoolExecutor.java 628]
        [java.lang.Thread run Thread.java 829]]} 

      The core message is 

       Client view of cluster state is {type=REPLICA_SET, servers=[
       {address=public-cd-a5.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.4 ms, state=CONNECTED}, 
       {address=public-cd-a1.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=156.8 ms, state=CONNECTED}, 
       {address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=38.3 ms, state=CONNECTED}, 
       {address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.9 ms, state=CONNECTED}, 
       {address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=35.1 ms, state=CONNECTED}
       ]

      I have a mongo replica-set cluster consisting of 5 nodes. The primary is public-cd-a1.disalg.cn:37017. However, in this exception, they are all secondaries in the client's view. I have checked the mongod.log and there are no network issues or leader re-election. Node a1 is always the primary. 
       
      Moreover, there are other types of nodes in the client's view except REPLICA_SET_SECONDARY. In this exception, I found REPLICA_SET_OTHER and UNKNOWN.

       :cause Timed out after 1000 ms while waiting for a server that matches WritableServerSelector. 
       Client view of cluster state is {type=REPLICA_SET, servers=[
         {address=public-cd-a5.disalg.cn:37017, type=, roundTripTime=37.0 ms, state=CONNECTED}, 
         {address=public-cd-a1.disalg.cn:37017, type=UNKNOWN, state=CONNECTING}, 
         {address=public-cd-a4.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=822.4 ms, state=CONNECTED},
         {address=public-cd-a2.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=34.1 ms, state=CONNECTED}, 
         {address=public-cd-a3.disalg.cn:37017, type=REPLICA_SET_SECONDARY, roundTripTime=195.9 ms, state=CONNECTED}] 

       
      This issue makes us can not determine whether a transaction is successfully committed.  Why does the client see a wrong view of the cluster state? 
       

            Assignee:
            chris.kelly@mongodb.com Chris Kelly
            Reporter:
            tsunaouyang@gmail.com Ouyang Tsuna
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: