Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47966

Operation interrupted error adding second csrs member on 4.4.0

    XMLWordPrintable

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Sprint:
      Repl 2020-05-18

      Description

      I'm getting "operation was interrupted" errors trying to add a second csrs member to a sharded cluster on 4.4.

      I started up a sharded cluster on 4.4.0-rc3 with 1 csrs member. I waited for the first csrs member to be primary with ismaster:true. Then, I started up a second csrs member and tried to add it to the csrs, but the reconfig fails after exactly 10 seconds with

      (InterruptedDueToReplStateChange) Reconfig finished but failed to propagate to a majority :: caused by :: Current config with {version: 2, term: 1} has not yet propagated to a majority of nodes :: caused by :: operation was interrupted


      Fuller log snippet from us:

      [.info] [cm/action/dbcmd.go:runRsReconfigCmd:237] <shFT_cs1> [13:44:59.824] Running ReplSetReconfig
      [.error] [cm/action/dbcmd.go:runRsReconfigCmd:256] <shFT_cs1> [13:45:09.932] Error running command:
      	cmd=[{replSetReconfig map[_id:csrs configsvr:true members:[map[_id:0 arbiterOnly:false buildIndexes:true hidden:false host:louisamac:9004 priority:1 slaveDelay:0 tags:map[] votes:1] map[_id:1 arbiterOnly:false buildIndexes:true hidden:false host:louisamac:9019 priority:1 slaveDelay:0 tags:map[] votes:1]] protocolVersion:1 settings:map[catchUpTakeoverDelayMillis:30000 catchUpTimeoutMillis:-1 chainingAllowed:true electionTimeoutMillis:10000 getLastErrorDefaults:map[w:1 wtimeout:0] getLastErrorModes:map[] heartbeatIntervalMillis:2000 heartbeatTimeoutSecs:10 replicaSetId:ObjectID("5eb1a5c5950801a13ee31207")] version:2]} {maxTimeMS 40000}]
      	connParams=louisamac:9004 (local=false) : (InterruptedDueToReplStateChange) Reconfig finished but failed to propagate to a majority :: caused by :: Current config with {version: 2, term: 1} has not yet propagated to a majority of nodes :: caused by :: operation was interrupted
      

      Entirely possible we're doing something wrong here on our end, but I'm not sure what step we're missing.

      Attached the first and second CSRS member logs. Let me know if you need anything else.

        Attachments

        1. csrs_primary.log
          7.34 MB
        2. new_csrs.log
          2.42 MB

          Activity

            People

            Assignee:
            pavithra.vetriselvan Pavithra Vetriselvan
            Reporter:
            louisa.berger Louisa Berger
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: