[SERVER-35551] Mongobridge nodes don't remember their network partition configuration after a restart Created: 12/Jun/18  Updated: 29/Oct/23  Resolved: 11/Feb/19

Status: Closed
Project: Core Server
Component/s: Replication, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.0.7, 4.1.8

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: gm-ack, stm, tig-mongobridge
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: STM 2019-02-11, STM 2019-02-25
Participants:
Linked BF Score: 16

 Description   

Mongobridge nodes allow us to partition the network between nodes in a replica set, but they don't store any durable state, so any of their current settings i.e. who they are dropping/accepting messages from, is lost when they are shut down and restart. This can be a problem for suites where we use mongobridges and also restart nodes i.e. rollback_fuzzer_unclean_shutdowns, Jepsen, etc., which may rely on a particular network topology being persistent across restarts.

There a few possible fixes for this. We could maintain a simple JSON configuration file that the mongobridge persists to disk and loads at restart, that stores the bridge's partitioning config. Alternatively, we could consider trying to keep a mongod's associated mongobridge running even when we shut down nodes in tests.



 Comments   
Comment by Githook User [ 13/Feb/19 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-35551 Avoid restarting mongobridge processes.

Changes ReplSetTest, RollbackTest, and RollbackTestDeluxe to avoid
restarting the mongobridge process associated with the mongod process
being restarted. This ensures that any partitioning which has been
configured remains intact after the server is restarted.

(cherry picked from commit e4f593b3dee7808d27c9db54c517ab198f5d9f89)
Branch: v4.0
https://github.com/mongodb/mongo/commit/7d84a75db4bbc1387f626a49cdf98b87b9a49b02

Comment by Githook User [ 11/Feb/19 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-35551 Avoid restarting mongobridge processes.

Changes ReplSetTest, RollbackTest, and RollbackTestDeluxe to avoid
restarting the mongobridge process associated with the mongod process
being restarted. This ensures that any partitioning which has been
configured remains intact after the server is restarted.
Branch: master
https://github.com/mongodb/mongo/commit/e4f593b3dee7808d27c9db54c517ab198f5d9f89

Comment by William Schultz (Inactive) [ 29/Jan/19 ]

Yes, the test fixture itself could maintain a view of the current network topology and when bridges restart, make sure they conform to the proper topology. There may be an issue about a bridge racing to start up and connect to other bridges before we are able to re-institute it's correct settings, though. Presumably we could work around this by being careful with the order that we start up bridges and their associated mongods. This approach does have the downside that it would not be general to all tests that use mongobridges and also do shutdowns.

Comment by Gregory McKeon (Inactive) [ 12/Jun/18 ]

Sending to TIG to triage since they own mongobridge.

Comment by William Schultz (Inactive) [ 12/Jun/18 ]

max.hirschhorn indicated that this isn't actually a problem for Jepsen due to the particular nemesis setup we use today.

Comment by William Schultz (Inactive) [ 12/Jun/18 ]

Not sure if the work to fix this should end up on TIG or Replication. Since it is specifically about how we use with mongobridge in our tests, it seems more likely to be TIG work.

Generated at Thu Feb 08 04:40:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.