[SERVER-77278] Replication rollback of a dropDatabase oplog entries leaves the in-memory database closed on the primary but open on secondaries, leading to secondaries crashing on receipt of conflicting database name Created: 18/May/23 Updated: 29/Oct/23 Resolved: 24/May/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 7.0.0-rc3, 6.0.11 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v7.0, v6.0
|
||||||||||||||||||||
| Sprint: | Execution Team 2023-05-29 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 29 | ||||||||||||||||||||
| Description |
|
A primary running dropDatabase first majority commits dropCollection oplog entries and then writes a final dropDatabase oplog entry, with which the in-memory database is closed atomically. However, if replication rollback goes back to right before the dropDatabase and after the dropCollection(s), this leaves the node with no in-memory database state. Whereas nodes that were secondaries when the dropDatabase was running on the primary will still have the database open in-memory. This sets the replica set up to allow a createCollection on a conflicting database on the primary, and then crash on oplog application on the secondaries with a DatabaseDifferCase error. Potential solution:
|
| Comments |
| Comment by Githook User [ 07/Sep/23 ] | ||||||||||||||||
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': ''}Message: (cherry picked from commit 7d6513611f97a60746af4c9ebd7e1cec2d4b844a) | ||||||||||||||||
| Comment by Githook User [ 24/May/23 ] | ||||||||||||||||
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: (cherry picked from commit 7d6513611f97a60746af4c9ebd7e1cec2d4b844a) | ||||||||||||||||
| Comment by Githook User [ 24/May/23 ] | ||||||||||||||||
|
Author: {'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}Message: | ||||||||||||||||
| Comment by Dianna Hohensee (Inactive) [ 22/May/23 ] | ||||||||||||||||
|
So, interestingly, I've ascertained that this is not a hot BF because it is also broken in 6.0, not just 7.0. Turns out that the test on which I reproduced the bug was actually broken from 4.3 to just recently during 7.0 development: Greg's work in | ||||||||||||||||
| Comment by Dianna Hohensee (Inactive) [ 18/May/23 ] | ||||||||||||||||
|
I got this to reproduce modifying an existing replication JS test. The test hangs because I have a w:2 that never gets acknowledged (secondary crashes, other secondary isn't taking writes), but the logs show
|