[SERVER-61117] Startup error results in a hang on shutdown Created: 29/Oct/21 Updated: 29/Oct/23 Resolved: 29/Mar/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Steps To Reproduce: | Start a 4.9 FCV replica set member with a 5.1 binary. |
| Sprint: | Replication 2021-11-29, Replication 2021-12-13, Replication 2021-12-27, Replication 2022-01-10, Replication 2022-01-24, Replication 2022-02-07, Repl 2022-02-21, Repl 2022-03-07, Repl 2022-03-21, Repl 2022-04-04 |
| Participants: |
| Description |
|
For the following startup error, the shutdown process will hang forever, waiting for replication to finish starting up:
The hang seems to happen when the main thread subsequently calls _waitForStartupComplete() on the repl coord.
In general, this type of hang is a potential issue for all exceptions that can occur in initAndListen. |
| Comments |
| Comment by Githook User [ 29/Mar/22 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: This reverts commit c616ce771a282833d3f515ea02a87d89f5c42089. |
| Comment by Githook User [ 29/Mar/22 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: Revert " This reverts commit 4f57c205480557f133535c65f743b88414d32280. |
| Comment by Githook User [ 28/Mar/22 ] |
|
Author: {'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}Message: |
| Comment by Eric Milkie [ 04/Nov/21 ] |
|
This particular exception is being generated by a call to tenant_migration_access_blocker::recoverTenantMigrationAccessBlockers(opCtx) as part of ReplicationCoordinatorImpl::_startLoadLocalConfig() in version 5.1.0. |
| Comment by Eric Milkie [ 29/Oct/21 ] |
|
I wonder if the actual fix here is to make DBExceptions in initAndListen use quick exit rather than exitCleanly in general, since there is too much potential for getting stuck. |