Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-104303

Replication lag on resharding donors can lead to critical section timeout

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • None
    • 0
    • None
    • None
    • None
    • None
    • None
    • None

      Currently, to enter the critical section, a donor needs to do a shard version refresh and process the resharding fields. The former involves doing a noop write with writeConcern "majority" with a timeout of 60 seconds. The latter involves persisting the configTime (most recent majority timestamp on the CSRS) to the config.vectorClock collection with writeConcern "majority" with a timeout of 60 seconds.

      For this reason, majority replication lag on a donor can make it to fail to transition to the critical section within the critical section timeout or soon enough for the recipients to finish fetching and applying oplog entries within the critical section timeout.

      Please note that the state transition writes on a donor don't involve waiting for writeConcern "majority".

            Assignee:
            Unassigned Unassigned
            Reporter:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: