[SERVER-59721] Node may become unable to sync from other members after performing rollback to stable timestamp Created: 01/Sep/21  Updated: 29/Oct/23  Resolved: 09/Nov/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.6, 5.1.1

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Samyukta Lanka
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-53351 Add resharding fuzzer task with step-... Closed
Related
is related to SERVER-53436 Transition to primary writes minValid... Closed
is related to SERVER-53642 Cleanup writes to appliedThrough that... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18, Repl 2021-11-01, Replication 2021-11-15
Participants:

 Description   

The SyncSourceResolver currently rejects sync source candidates which don't contain an oplog entry corresponding to the node's minvalid point. The reason for this goes back to SERVER-25145 and SERVER-27050 with safe-guarding against certain double rollback scenarios in rollback via refetch. However, rollback to stable timestamp doesn't modify the minvalid point like rollback via refetch does and potentially leaves the minvalid point as an oplog entry not on the true timeline. This can lead a node using rollback to stable timestamp to perpetually reject all candidates as valid sync sources.

[js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.491+00:00 I  ROLLBACK 21612   [BackgroundSync] "Rollback summary","attr":{"startTime":{"$date":"2021-08-28T04:05:45.279Z"},"endTime":{"$date":"2021-08-28T04:05:45.491Z"},"syncSource":"ip-10-122-35-32.ec2.internal:20025","rbid":2,"lastOptimeRolledBack":{"ts":{"$timestamp":{"t":1630123544,"i":2}},"t":4},"commonPoint":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"lastWallClockTimeRolledBack":{"$date":"2021-08-28T04:05:44.042Z"},"firstOpWallClockTimeAfterCommonPoint":{"$date":"2021-08-28T04:05:43.778Z"},"wallClockTimeDiff":0,"truncateTimestamp":{"$timestamp":{"t":1630123543,"i":90}},"stableTimestamp":{"$timestamp":{"t":1630123543,"i":89}},"shardIdentityRolledBack":false,"configServerConfigVersionRolledBack":false,"affectedSessions":[],"affectedNamespaces":["config.localReshardingConflictStash.e500bdbc-ca33-44f2-94ae-59dda55888bd.shard0-donor0"],"rollbackCommandCounts":{"create":1},"totalEntriesRolledBackIncludingNoops":2}
...
[js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.492+00:00 I  REPL     21799   [BackgroundSync] "Sync source candidate chosen","attr":{"syncSource":"ip-10-122-35-32.ec2.internal:20025"}
[js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.495+00:00 W  REPL     5579711 [ReplCoordExtern-0] "We cannot use candidate as a sync source because it does not contain the necessary operations for us to reach a consistent state. Denylisting this sync source","attr":{"candidate":"ip-10-122-35-32.ec2.internal:20025","error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"remote oplog does not contain entry with optime matching our required optime"},"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"requiredOpTime":{"ts":{"$timestamp":{"t":1630123543,"i":91}},"t":1},"denylistDurationSeconds":60,"denylistUntil":{"$date":"2021-08-28T04:06:45.495Z"}}
[js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.495+00:00 I  REPL     21799   [ReplCoordExtern-0] "Sync source candidate chosen","attr":{"syncSource":"ip-10-122-35-32.ec2.internal:20023"}
[js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.496+00:00 W  REPL     5579711 [ReplCoordExtern-0] "We cannot use candidate as a sync source because it does not contain the necessary operations for us to reach a consistent state. Denylisting this sync source","attr":{"candidate":"ip-10-122-35-32.ec2.internal:20023","error":{"code":47,"codeName":"NoMatchingDocument","errmsg":"remote oplog does not contain entry with optime matching our required optime"},"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1630123543,"i":90}},"t":1},"requiredOpTime":{"ts":{"$timestamp":{"t":1630123543,"i":91}},"t":1},"denylistDurationSeconds":60,"denylistUntil":{"$date":"2021-08-28T04:06:45.496Z"}}
[js_test:resharding_secondary_recovers_temp_ns_metadata] d20024| 2021-08-28T04:05:45.496+00:00 I  REPL     21798   [ReplCoordExtern-0] "Could not find member to sync from"

https://logkeeper.mongodb.org/lobster/build/28b252df997501da4d24e819b86a2c28/test/6129b6029041305d4c056f2e#bookmarks=0%2C7580%2C8075%2C8328%2C8339%2C8341%2C14893%2C14907&f~=000~oplogsize&f~=000~%22t%22%3A5%5C%7D&f~=000~%22Replica%20set%20state%20transition%22&f~=100~d20024%5C%7C&shareLine=8339



 Comments   
Comment by Githook User [ 23/Nov/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-59721 Stop using the minValid optime when resolving a sync source

(cherry picked from commit ba5d667c9a3f3105e0d68babc42a5b2d36524062)
(cherry picked from commit 7b6775a0ad49992a3054b8bcfcda29914803fc5b)
Branch: v5.0
https://github.com/mongodb/mongo/commit/d792c7fad39b4245b8070928cad4b86b7b78f72b

Comment by Githook User [ 18/Nov/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-59721 Stop using the minValid optime when resolving a sync source

(cherry picked from commit ba5d667c9a3f3105e0d68babc42a5b2d36524062)
Branch: v5.1
https://github.com/mongodb/mongo/commit/7b6775a0ad49992a3054b8bcfcda29914803fc5b

Comment by Githook User [ 09/Nov/21 ]

Author:

{'name': 'Samy Lanka', 'email': 'samy.lanka@mongodb.com', 'username': 'lankas'}

Message: SERVER-59721 Stop using the minValid optime when resolving a sync source
Branch: master
https://github.com/mongodb/mongo/commit/ba5d667c9a3f3105e0d68babc42a5b2d36524062

Generated at Thu Feb 08 05:47:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.