[SERVER-11103] replset mutex should be not be held during DB lock attempts Created: 09/Oct/13 Updated: 11/Jul/16 Resolved: 28/Oct/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.6 |
| Fix Version/s: | 2.8.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Alexander Komyagin | Assignee: | Matt Dannenberg |
| Resolution: | Done | Votes: | 0 |
| Labels: | elections | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Result: Elections will not proceed correctly until heartbeats are processed. Repro:
Expected behavior:
Ideally, XXX:27019 should vote for XXX:27018 (see |
| Comments |
| Comment by Githook User [ 28/Oct/14 ] | ||||||
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: | ||||||
| Comment by Scott Hernandez (Inactive) [ 20/Oct/14 ] | ||||||
|
This fails to elect primary due to a config version mismatch, as this just repeats:
This test is flawed since it blocks writing the new config version which causes the problem of having an old config version for voting. I put up the start of a better test, but haven't finished... and now mattd@10gen.com is taking over | ||||||
| Comment by Eric Milkie [ 17/Oct/14 ] | ||||||
|
The code now works as per Expected Behavior in the description above. However, the provided test still fails because a node is not allowed to reconfig while fsync-and-locked, as per Matt's comment above. | ||||||
| Comment by Eric Milkie [ 28/Aug/14 ] | ||||||
|
We need to confirm the attached jstest succeeds after the refactor is complete. | ||||||
| Comment by Matt Dannenberg [ 16/Jan/14 ] | ||||||
|
I did a bit of digging around and it looks like the problem is that to save the new config (saveConfigLocally in rs_config.cpp) the node needs to acquire a WriteContext on the "local.system.replset" collection, which it cannot get due to being fsync and locked. | ||||||
| Comment by Alexander Komyagin [ 14/Oct/13 ] | ||||||
|
Attaching jstest that reproduces the issue |