Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59721

Node may become unable to sync from other members after performing rollback to stable timestamp

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0, 5.0.6, 5.1.1
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18, Repl 2021-11-01, Replication 2021-11-15

      The SyncSourceResolver currently rejects sync source candidates which don't contain an oplog entry corresponding to the node's minvalid point. The reason for this goes back to SERVER-25145 and SERVER-27050 with safe-guarding against certain double rollback scenarios in rollback via refetch. However, rollback to stable timestamp doesn't modify the minvalid point like rollback via refetch does and potentially leaves the minvalid point as an oplog entry not on the true timeline. This can lead a node using rollback to stable timestamp to perpetually reject all candidates as valid sync sources.

      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.491+00:00 I  ROLLBACK 21612   [BackgroundSync] "Rollback summary","attr":{"startTime":{"$date":"2021-08-28T04:05:45.279Z"},"endTime":{"$date":"2021-08-28T04:05:45.491Z"},"syncSource":"ip-10-122-35-32.ec2.internal:20025","rbid":2,"lastOptimeRolledBack":{"ts":{"$timestamp":{"t":1630123544,"i":2}},"t":4},"commonPoint":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"lastWallClockTimeRolledBack":{"$date":"2021-08-28T04:05:44.042Z"},"firstOpWallClockTimeAfterCommonPoint":{"$date":"2021-08-28T04:05:43.778Z"},"wallClockTimeDiff":0,"truncateTimestamp":{"$timestamp":{"t":1630123543,"i":90}},"stableTimestamp":{"$timestamp":{"t":1630123543,"i":89}},"shardIdentityRolledBack":false,"configServerConfigVersionRolledBack":false,"affectedSessions":[],"affectedNamespaces":["config.localReshardingConflictStash.e500bdbc-ca33-44f2-94ae-59dda55888bd.shard0-donor0"],"rollbackCommandCounts":{"create":1},"totalEntriesRolledBackIncludingNoops":2}
      ...
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.492+00:00 I  REPL     21799   [BackgroundSync] "Sync source candidate chosen","attr":{"syncSource":"ip-10-122-35-32.ec2.internal:20025"}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.495+00:00 W  REPL     5579711 [ReplCoordExtern-0] "We cannot use candidate as a sync source because it does not contain the necessary operations for us to reach a consistent state. Denylisting this sync source","attr":{"candidate":"ip-10-122-35-32.ec2.internal:20025","error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"remote oplog does not contain entry with optime matching our required optime"},"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"requiredOpTime":{"ts":{"$timestamp":{"t":1630123543,"i":91}},"t":1},"denylistDurationSeconds":60,"denylistUntil":{"$date":"2021-08-28T04:06:45.495Z"}}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.495+00:00 I  REPL     21799   [ReplCoordExtern-0] "Sync source candidate chosen","attr":{"syncSource":"ip-10-122-35-32.ec2.internal:20023"}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.496+00:00 W  REPL     5579711 [ReplCoordExtern-0] "We cannot use candidate as a sync source because it does not contain the necessary operations for us to reach a consistent state. Denylisting this sync source","attr":{"candidate":"ip-10-122-35-32.ec2.internal:20023","error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"remote oplog does not contain entry with optime matching our required optime"},"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"requiredOpTime":{"ts":{"$timestamp":{"t":1630123543,"i":91}},"t":1},"denylistDurationSeconds":60,"denylistUntil":{"$date":"2021-08-28T04:06:45.496Z"}}
      [js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.496+00:00 I  REPL     21798   [ReplCoordExtern-0] "Could not find member to sync from"
      

      https://logkeeper.mongodb.org/lobster/build/28b252df997501da4d24e819b86a2c28/test/6129b6029041305d4c056f2e#bookmarks=0%2C7580%2C8075%2C8328%2C8339%2C8341%2C14893%2C14907&f~=000~oplogsize&f~=000~%22t%22%3A5%5C%7D&f~=000~%22Replica%20set%20state%20transition%22&f~=100~d20024%5C%7C&shareLine=8339

            Assignee:
            samy.lanka@mongodb.com Samyukta Lanka
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: