[SERVER-44061] Race while setting replication maintenance mode. Created: 17/Oct/19  Updated: 29/Oct/23  Resolved: 31/Oct/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.6.14, 4.0.13, 3.4.23
Fix Version/s: 4.3.1, 4.2.3, 4.0.15

Type: Bug Priority: Major - P3
Reporter: Kevin Arhelger Assignee: Lingzhi Deng
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Repl 2019-11-04
Participants:

 Description   

Definitely affects 4.0.13, but on inspection the same logic appears to be present in 4.2.

When marking a node as too stale before moving into maintenance, an optimization occurs to only perform the maintenance mode transition once. However, if this races (an election starts), we will return out of this function without ever successfully setting maintenance mode, allowing the member to remain in secondary state until restart.

2019-08-28T12:28:51.035+0000 E REPL     [rsBackgroundSync] too stale to catch up -- entering maintenance mode
2019-08-28T12:28:51.943+0000 W REPL     [rsBackgroundSync] Failed to transition into maintenance mode: NotSecondary: currently running for election

https://github.com/mongodb/mongo/blob/r4.0.13/src/mongo/db/repl/bgsync.cpp#L359



 Comments   
Comment by Githook User [ 23/Dec/19 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-44061: Fix race between setting replication maintenance mode and concurrent election

(cherry picked from commit d3546ccb50f0137962f8185140281e7fd7323e4a)
Branch: v4.0
https://github.com/mongodb/mongo/commit/78992b6422979574cc8cea8145fd49561c8f1caf

Comment by Githook User [ 23/Dec/19 ]

Author:

{'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com', 'username': 'ldennis'}

Message: SERVER-44061: Fix race between setting replication maintenance mode and concurrent election
Branch: v4.2
https://github.com/mongodb/mongo/commit/d3546ccb50f0137962f8185140281e7fd7323e4a

Comment by Lingzhi Deng [ 20/Dec/19 ]

Backporting sounds good.

Comment by Judah Schvimer [ 20/Dec/19 ]

lingzhi.deng, I just approved this for backport. I think backporting this back to 4.0 will make the multi_version tests stop failing. If not, you may need to add a "requires_fcv..." tag since it adds a failpoint.

Comment by Githook User [ 30/Oct/19 ]

Author:

{'name': 'Lingzhi Deng', 'username': 'ldennis', 'email': 'lingzhi.deng@mongodb.com'}

Message: SERVER-44061: Fix race between setting replication maintenance mode and concurrent election
Branch: master
https://github.com/mongodb/mongo/commit/39187be9d1175f5ef52ffca858c5e9489fd6cedf

Generated at Thu Feb 08 05:04:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.