Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59721

Node may become unable to sync from other members after performing rollback to stable timestamp

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 5.2.0, 5.0.6, 5.1.1
    • Replication
    • None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18, Repl 2021-11-01, Replication 2021-11-15

    Description

      The SyncSourceResolver currently rejects sync source candidates which don't contain an oplog entry corresponding to the node's minvalid point. The reason for this goes back to SERVER-25145 and SERVER-27050 with safe-guarding against certain double rollback scenarios in rollback via refetch. However, rollback to stable timestamp doesn't modify the minvalid point like rollback via refetch does and potentially leaves the minvalid point as an oplog entry not on the true timeline. This can lead a node using rollback to stable timestamp to perpetually reject all candidates as valid sync sources.

      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.491+00:00 I  ROLLBACK 21612   [BackgroundSync] "Rollback summary","attr":{"startTime":{"$date":"2021-08-28T04:05:45.279Z"},"endTime":{"$date":"2021-08-28T04:05:45.491Z"},"syncSource":"ip-10-122-35-32.ec2.internal:20025","rbid":2,"lastOptimeRolledBack":{"ts":{"$timestamp":{"t":1630123544,"i":2}},"t":4},"commonPoint":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"lastWallClockTimeRolledBack":{"$date":"2021-08-28T04:05:44.042Z"},"firstOpWallClockTimeAfterCommonPoint":{"$date":"2021-08-28T04:05:43.778Z"},"wallClockTimeDiff":0,"truncateTimestamp":{"$timestamp":{"t":1630123543,"i":90}},"stableTimestamp":{"$timestamp":{"t":1630123543,"i":89}},"shardIdentityRolledBack":false,"configServerConfigVersionRolledBack":false,"affectedSessions":[],"affectedNamespaces":["config.localReshardingConflictStash.e500bdbc-ca33-44f2-94ae-59dda55888bd.shard0-donor0"],"rollbackCommandCounts":{"create":1},"totalEntriesRolledBackIncludingNoops":2}
      ...
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.492+00:00 I  REPL     21799   [BackgroundSync] "Sync source candidate chosen","attr":{"syncSource":"ip-10-122-35-32.ec2.internal:20025"}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.495+00:00 W  REPL     5579711 [ReplCoordExtern-0] "We cannot use candidate as a sync source because it does not contain the necessary operations for us to reach a consistent state. Denylisting this sync source","attr":{"candidate":"ip-10-122-35-32.ec2.internal:20025","error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"remote oplog does not contain entry with optime matching our required optime"},"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"requiredOpTime":{"ts":{"$timestamp":{"t":1630123543,"i":91}},"t":1},"denylistDurationSeconds":60,"denylistUntil":{"$date":"2021-08-28T04:06:45.495Z"}}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.495+00:00 I  REPL     21799   [ReplCoordExtern-0] "Sync source candidate chosen","attr":{"syncSource":"ip-10-122-35-32.ec2.internal:20023"}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.496+00:00 W  REPL     5579711 [ReplCoordExtern-0] "We cannot use candidate as a sync source because it does not contain the necessary operations for us to reach a consistent state. Denylisting this sync source","attr":{"candidate":"ip-10-122-35-32.ec2.internal:20023","error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"remote oplog does not contain entry with optime matching our required optime"},"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"requiredOpTime":{"ts":{"$timestamp":{"t":1630123543,"i":91}},"t":1},"denylistDurationSeconds":60,"denylistUntil":{"$date":"2021-08-28T04:06:45.496Z"}}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.496+00:00 I  REPL     21798   [ReplCoordExtern-0] "Could not find member to sync from"
      

      https://logkeeper.mongodb.org/lobster/build/28b252df997501da4d24e819b86a2c28/test/6129b6029041305d4c056f2e#bookmarks=0%2C7580%2C8075%2C8328%2C8339%2C8341%2C14893%2C14907&f~=000~oplogsize&f~=000~%22t%22%3A5%5C%7D&f~=000~%22Replica%20set%20state%20transition%22&f~=100~d20024%5C%7C&shareLine=8339

      Attachments

        Issue Links

          Activity

            People

              samy.lanka@mongodb.com Samyukta Lanka
              max.hirschhorn@mongodb.com Max Hirschhorn
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: