[SERVER-35551] Mongobridge nodes don't remember their network partition configuration after a restart Created: 12/Jun/18 Updated: 29/Oct/23 Resolved: 11/Feb/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.7, 4.1.8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Max Hirschhorn |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | gm-ack, stm, tig-mongobridge | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.0
|
||||||||
| Sprint: | STM 2019-02-11, STM 2019-02-25 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 16 | ||||||||
| Description |
|
Mongobridge nodes allow us to partition the network between nodes in a replica set, but they don't store any durable state, so any of their current settings i.e. who they are dropping/accepting messages from, is lost when they are shut down and restart. This can be a problem for suites where we use mongobridges and also restart nodes i.e. rollback_fuzzer_unclean_shutdowns, Jepsen, etc., which may rely on a particular network topology being persistent across restarts. There a few possible fixes for this. We could maintain a simple JSON configuration file that the mongobridge persists to disk and loads at restart, that stores the bridge's partitioning config. Alternatively, we could consider trying to keep a mongod's associated mongobridge running even when we shut down nodes in tests. |
| Comments |
| Comment by Githook User [ 13/Feb/19 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: Changes ReplSetTest, RollbackTest, and RollbackTestDeluxe to avoid (cherry picked from commit e4f593b3dee7808d27c9db54c517ab198f5d9f89) |
| Comment by Githook User [ 11/Feb/19 ] |
|
Author: {'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}Message: Changes ReplSetTest, RollbackTest, and RollbackTestDeluxe to avoid |
| Comment by William Schultz (Inactive) [ 29/Jan/19 ] |
|
Yes, the test fixture itself could maintain a view of the current network topology and when bridges restart, make sure they conform to the proper topology. There may be an issue about a bridge racing to start up and connect to other bridges before we are able to re-institute it's correct settings, though. Presumably we could work around this by being careful with the order that we start up bridges and their associated mongods. This approach does have the downside that it would not be general to all tests that use mongobridges and also do shutdowns. |
| Comment by Gregory McKeon (Inactive) [ 12/Jun/18 ] |
|
Sending to TIG to triage since they own mongobridge. |
| Comment by William Schultz (Inactive) [ 12/Jun/18 ] |
|
max.hirschhorn indicated that this isn't actually a problem for Jepsen due to the particular nemesis setup we use today. |
| Comment by William Schultz (Inactive) [ 12/Jun/18 ] |
|
Not sure if the work to fix this should end up on TIG or Replication. Since it is specifically about how we use with mongobridge in our tests, it seems more likely to be TIG work. |