[SERVER-39172] Shut down mongod nodes in parallel in ReplSetTest Created: 24/Jan/19 Updated: 29/Oct/23 Resolved: 05/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | William Schultz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Sprint: | Repl 2019-11-04, Repl 2019-11-18 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
In the ReplSetTest.stopSet function, we call ReplSetTest.stop sequentially for each node in the replica set. This in turn calls MongoRunner.stopMongod for each replica set node. By default, when MongoRunner shuts down a mongod node, it will call waitpid on the underlying process before continuing. This means that in ReplSetTest.stopSet we need to wait for a mongod to shut down cleanly before moving on and shutting down the next node. We could speed up this process by having a way in MongoRunner to shut down a process without calling waitpid on it. In ReplSetTest, after initiating shutdown on each node (and not blocking), we could go through each process and call waitProgram on its process id, which will call waitpid. This should speed up the shutdown process in ReplSetTest and reduce test times for both local testing and, ideally, overall Evergreen test suite durations. |
| Comments |
| Comment by Githook User [ 04/Nov/19 ] |
|
Author: {'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}Message: |
| Comment by William Schultz (Inactive) [ 04/Nov/19 ] |
|
We can also look at the stopSet shutdown times across the replica_sets suite after the changes, taken from this patch build): Median: 731 ms We can view this in comparison to the stopSet shutdown profile before the changes, taken from this patch build: Median: 1497 ms I expect that the profile is still somewhat spread out even after the changes due to the fact that we have to validate collections during shutdown, and this may be a non-trivial amount of work that varies by test. |
| Comment by William Schultz (Inactive) [ 04/Nov/19 ] |
|
From an initial patch build that includes these changes, we can see the improvements for the ReplSetTest control tests. replsettest_control_1_node.js: stopSet stopped all replica set nodes in 838ms This is a scale factor of (1141/838) = 1.36x, within the 1.5x goal bound. |
| Comment by William Schultz (Inactive) [ 24/Jan/19 ] |
|
> That sounds like what you're doing, no? Yes, that is the intention here. I also wanted to make it clear in this ticket that ReplSetTest.stopSet should take advantage of such a feature. |
| Comment by Max Hirschhorn [ 24/Jan/19 ] |
|
I had interpreted
The title is misguided because failing to ever call WaitForSingleObject() on Windows would lead to "The process cannot access the file because it is being used by another process." type of messages due to not waiting long enough for the OS to actually release all the handle objects even after the process exits before attempting to use the same dbpath again (for example). |
| Comment by William Schultz (Inactive) [ 24/Jan/19 ] |
|
Based on the title of |
| Comment by Max Hirschhorn [ 24/Jan/19 ] |
|
william.schultz, I think one of |
| Comment by William Schultz (Inactive) [ 24/Jan/19 ] |
|
By running a basic ReplSetTest shutdown test(replset_shutdown.js |