Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47966

Operation interrupted error adding second csrs member on 4.4.0

    XMLWordPrintable

Details

    • Question
    • Status: Closed
    • Major - P3
    • Resolution: Works as Designed
    • None
    • None
    • Replication
    • None
    • Repl 2020-05-18

    Description

      I'm getting "operation was interrupted" errors trying to add a second csrs member to a sharded cluster on 4.4.

      I started up a sharded cluster on 4.4.0-rc3 with 1 csrs member. I waited for the first csrs member to be primary with ismaster:true. Then, I started up a second csrs member and tried to add it to the csrs, but the reconfig fails after exactly 10 seconds with

      (InterruptedDueToReplStateChange) Reconfig finished but failed to propagate to a majority :: caused by :: Current config with {version: 2, term: 1} has not yet propagated to a majority of nodes :: caused by :: operation was interrupted


      Fuller log snippet from us:

      [.info] [cm/action/dbcmd.go:runRsReconfigCmd:237] <shFT_cs1> [13:44:59.824] Running ReplSetReconfig
      [.error] [cm/action/dbcmd.go:runRsReconfigCmd:256] <shFT_cs1> [13:45:09.932] Error running command:
      	cmd=[{replSetReconfig map[_id:csrs configsvr:true members:[map[_id:0 arbiterOnly:false buildIndexes:true hidden:false host:louisamac:9004 priority:1 slaveDelay:0 tags:map[] votes:1] map[_id:1 arbiterOnly:false buildIndexes:true hidden:false host:louisamac:9019 priority:1 slaveDelay:0 tags:map[] votes:1]] protocolVersion:1 settings:map[catchUpTakeoverDelayMillis:30000 catchUpTimeoutMillis:-1 chainingAllowed:true electionTimeoutMillis:10000 getLastErrorDefaults:map[w:1 wtimeout:0] getLastErrorModes:map[] heartbeatIntervalMillis:2000 heartbeatTimeoutSecs:10 replicaSetId:ObjectID("5eb1a5c5950801a13ee31207")] version:2]} {maxTimeMS 40000}]
      	connParams=louisamac:9004 (local=false) : (InterruptedDueToReplStateChange) Reconfig finished but failed to propagate to a majority :: caused by :: Current config with {version: 2, term: 1} has not yet propagated to a majority of nodes :: caused by :: operation was interrupted
      

      Entirely possible we're doing something wrong here on our end, but I'm not sure what step we're missing.

      Attached the first and second CSRS member logs. Let me know if you need anything else.

      Attachments

        1. new_csrs.log
          2.42 MB
        2. csrs_primary.log
          7.34 MB

        Activity

          People

            pavithra.vetriselvan@mongodb.com Pavithra Vetriselvan
            louisa.berger@mongodb.com Louisa Berger
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: