MongoDB replica set fails to add new member

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Environment:
      LINUX : Rocky Linux release 9.6 (Blue Onyx)
      Mongodb : V8.0.9
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Problem Statement/Rationale

       

      mongod.conf

      1. mongod.conf
        systemLog:
          destination: file
          logAppend: true
          logRotate: rename
          path: /data/mongodb/mongos/log/mongod.log
      1. Where and how to store data.
        storage:
          dbPath:  /data/mongodb/config/data
          wiredTiger:
            engineConfig:
              cacheSizeGB: 1
      1. how the process runs
        processManagement:
          fork: true
          pidFilePath: /data/mongodb/config/mongod.pid
          timeZoneInfo: /usr/share/zoneinfo
      1. network interfaces
        net:
          port: 27017
          bindIp: 0.0.0.0  # Enter 0.0.0.0,:: to bind to all IPv4 and IPv6 addresses or, alternatively, use the net.bindIpAll setting.

       

      #operationProfiling:

      replication: 
        replSetName: configrepl

       

      rs.add("BJ13-mongodb-v8core04.bcld.com:27017")

       name: 'BJ13-mongodb-v8core04.bcld.com:27017',
            health: 0,
            state: 8,
            stateStr: '(not reachable/healthy)',

       

      {"t":\{"$date":"2025-11-24T17:47:38.429+08:00"}

      ,"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"/tmp/mongodb-27017.sock"}}

      {"t":\{"$date":"2025-11-24T17:47:38.430+08:00"}

      ,"s":"I",  "c":"NETWORK",  "id":23015,   "ctx":"listener","msg":"Listening on","attr":{"address":"0.0.0.0:27017"}}

      {"t":\{"$date":"2025-11-24T17:47:38.430+08:00"}

      ,"s":"I",  "c":"NETWORK",  "id":23016,   "ctx":"listener","msg":"Waiting for connections","attr":{"port":27017,"ssl":"off"}}

      {"t":\{"$date":"2025-11-24T17:47:38.430+08:00"}

      ,"s":"I",  "c":"CONTROL",  "id":8423403, "ctx":"initandlisten","msg":"mongod startup complete","attr":{"Summary of time elapsed":{"Startup from clean shutdown?":true,"Statistics":{"Set up periodic runner":"0 ms","Set up online certificate status protocol manager":"0 ms","Transport layer setup":"20 ms","Run initial syncer crash recovery":"0 ms","Create storage engine lock file in the data directory":"1 ms","Get metadata describing storage engine":"0 ms","Create storage engine":"1010 ms","Write current PID to file":"0 ms","Write a new metadata for storage engine":"0 ms","Initialize FCV before rebuilding indexes":"21 ms","Drop abandoned idents and get back indexes that need to be rebuilt or builds that need to be restarted":"0 ms","Rebuild indexes for collections":"0 ms","Build user and roles graph":"0 ms","Set up the background thread pool responsible for waiting for opTimes to be majority committed":"2 ms","Start up cluster time keys manager with a local/direct keys client":"5 ms","Start up the replication coordinator":"124 ms","Create an oplog view for tenant migrations":"46 ms","Ensure the change stream collections on startup contain consistent data":"0 ms","Write startup options to the audit log":"0 ms","Start transport layer":"4 ms","_initAndListen total elapsed time":"1569 ms"}}}}

      {"t":\{"$date":"2025-11-24T17:47:38.433+08:00"}

      ,"s":"W",  "c":"SHARDING", "id":7012500, "ctx":"QueryAnalysisConfigurationsRefresher","msg":"Failed to refresh query analysis configurations, will try again at the next interval","attr":{"error":"PrimarySteppedDown: No primary exists currently"}}

      {"t":\{"$date":"2025-11-24T17:47:38.435+08:00"}

      ,"s":"I",  "c":"CONTROL",  "id":20712,   "ctx":"LogicalSessionCacheReap","msg":"Sessions collection is not set up; waiting until next sessions reap interval","attr":{"error":"NamespaceNotFound: config.system.sessions does not exist"}}

      {"t":\{"$date":"2025-11-24T17:47:38.826+08:00"}

      ,"s":"I",  "c":"-",        "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":600}}

      {"t":\{"$date":"2025-11-24T17:47:39.000+08:00"}

      ,"s":"W",  "c":"QUERY",    "id":23799,   "ctx":"ftdc","msg":"Aggregate command executor error","attr":{"error":

      {"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [local.oplog.rs] not found."}

      ,"stats":{},"cmd":{"aggregate":"oplog.rs","cursor":{},"pipeline":[{"$collStats":{"storageStats":

      {"waitForLock":false,"numericOnly":true}

      }}],"$db":"local"}}}

      {"t":\{"$date":"2025-11-24T17:47:39.001+08:00"}

      ,"s":"W",  "c":"QUERY",    "id":23799,   "ctx":"ftdc","msg":"Aggregate command executor error","attr":{"error":

      {"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [config.transactions] not found."}

      ,"stats":{},"cmd":{"aggregate":"transactions","cursor":{},"pipeline":[{"$collStats":{"storageStats":

      {"waitForLock":false,"numericOnly":true}

      }}],"$db":"config"}}}

      {"t":\{"$date":"2025-11-24T17:47:39.001+08:00"}

      ,"s":"W",  "c":"QUERY",    "id":23799,   "ctx":"ftdc","msg":"Aggregate command executor error","attr":{"error":

      {"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [config.image_collection] not found."}

      ,"stats":{},"cmd":{"aggregate":"image_collection","cursor":{},"pipeline":[{"$collStats":{"storageStats":

      {"waitForLock":false,"numericOnly":true}

      }}],"$db":"config"}}}

      {"t":\{"$date":"2025-11-24T17:47:39.426+08:00"}

      ,"s":"I",  "c":"-",        "id":4939300, "ctx":"monitoring-keys-for-HMAC","msg":"Failed to refresh key cache","attr":{"error":"ReadConcernMajorityNotAvailableYet: Read concern majority reads are currently not possible.","nextWakeupMillis":800}}

      {"t":\{"$date":"2025-11-24T17:47:40.000+08:00"}

      ,"s":"W",  "c":"QUERY",    "id":23799,   "ctx":"ftdc","msg":"Aggregate command executor error","attr":{"error":

      {"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [local.oplog.rs] not found."}

      ,"stats":{},"cmd":{"aggregate":"oplog.rs","cursor":{},"pipeline":[{"$collStats":{"storageStats":

      {"waitForLock":false,"numericOnly":true}

      }}],"$db":"local"}}}

      {"t":\{"$date":"2025-11-24T17:47:40.000+08:00"}

      ,"s":"W",  "c":"QUERY",    "id":23799,   "ctx":"ftdc","msg":"Aggregate command executor error","attr":{"error":

      {"code":26,"codeName":"NamespaceNotFound","errmsg":"Unable to retrieve storageStats in $collStats stage :: caused by :: Collection [config.transactions] not found."}

      ,"stats":{},"cmd":{"aggregate":"transactions","cursor":{},"pipeline":[{"$collStats":{"storageStats":

      {"waitForLock":false,"numericOnly":true}

      }}],"$db":"config"}}}

      Please be sure to attach relevant logs with any sensitive data redacted.
      How to retrieve logs for: Compass; Shell

      Steps to Reproduce

      1. Install a 3-node replica cluster;
      2. Node 1 fails and is removed from the cluster;
      3.  Add a new node to the existing cluster (Node 4);

      Expected Results

      {"t":\{"$date":"2025-11-24T17:59:03.705+08:00"}

      ,"s":"I",  "c":"WTCHKPT",  "id":22430,   "svc":"S", "ctx":"Checkpointer","msg":"WiredTiger message","attr":{"message":

      {"ts_sec":1763978343,"ts_usec":704984,"thread":"4945:0x7f0404cfe640","session_name":"WT_SESSION.checkpoint","category":"WT_VERB_CHECKPOINT_PROGRESS","category_id":7,"verbose_level":"DEBUG_1","verbose_level_id":1,"msg":"saving checkpoint snapshot min: 1669, snapshot max: 1669 snapshot count: 0, oldest timestamp: (1763978042, 1) , meta checkpoint timestamp: (1763978342, 1) base write gen: 12349830"}

      }}

      {"t":\{"$date":"2025-11-24T17:59:05.390+08:00"}

      ,"s":"I",  "c":"REPL_HB",  "id":23974,   "svc":"S", "ctx":"ReplCoord-11","msg":"Heartbeat failed after max retries","attr":{"target":"BJ13-mongodb-v8core04.bcld.com:27017","maxHeartbeatRetries":2,"error":

      {"code":76,"codeName":"NoReplicationEnabled","errmsg":"not running using replication"}

      }}

      {"t":\{"$date":"2025-11-24T17:59:07.392+08:00"}

      ,"s":"I",  "c":"REPL_HB",  "id":23974,   "svc":"S", "ctx":"ReplCoord-11","msg":"Heartbeat failed after max retries","attr":{"target":"BJ13-mongodb-v8core04.bcld.com:27017","maxHeartbeatRetries":2,"error":

      {"code":76,"codeName":"NoReplicationEnabled","errmsg":"not running using replication"}

      }}

      {"t":\{"$date":"2025-11-24T17:59:09.393+08:00"}

      ,"s":"I",  "c":"REPL_HB",  "id":23974,   "svc":"S", "ctx":"ReplCoord-11","msg":"Heartbeat failed after max retries","attr":{"target":"BJ13-mongodb-v8core04.bcld.com:27017","maxHeartbeatRetries":2,"error":

      {"code":76,"codeName":"NoReplicationEnabled","errmsg":"not running using replication"}

      }}

      Actual Results

      Inconsistency between the cluster's local replset.election  and the new instance node 4 local replset.election information.

      cluster:

      eplset.election.id: 683448956e3956e6beaf68cb

       

      node4

      eplset.election.id: 692429baf7213e7dd4e41358

       

      Additional Notes

      Any additional information that may be useful to include.

            Assignee:
            Unassigned
            Reporter:
            王 峰
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: