[SERVER-63110] Umbrella ticket for Faulty Mongos project v5.0 backport Created: 28/Jan/22  Updated: 29/Oct/23  Resolved: 03/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0

Type: Improvement Priority: Major - P3
Reporter: Andrew Shuvalov (Inactive) Assignee: Andrew Shuvalov (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Participants:

 Description   
  1. Manually examine all test logs at Waterfall, especially LDAP
  2. Manage all backport tickets

Ticket List for the mainline:

SERVER-58152 Create Feature flag for Remove Faulty Mongos From Cluste…
SERVER-59356 Initial scaffolding of the FaultManager
SERVER-59357 Initial scaffolding of the Fault class
SERVER-59358 FaultFacet class initial scaffolding and unit test
SERVER-59362 Setup Fault Manager State Machine
SERVER-59522 HealthCheckStatus should track fault status and lifetime
SERVER-59360 Health observer registration and basic mock class
SERVER-59496 Fault class is made to be a container of fault facets
SERVER-59567 Make the HealthManager to instantiate HealthObservers
SERVER-59567 Health observers are invoked periodically and create a F…
SERVER-59608 remove const from return type
SERVER-59367 state machine transition when entering the transient fau…
SERVER-59912 changes in the base health package for the Ldap health o…
SERVER-59361: Implement periodic health check thread pool
SERVER-60316 FaultManager should start with periodic checks disabled
SERVER-59370: unify transitionState test code path and production code
SERVER-60079 Make health checkers asynchronous, block check until pre…
SERVER-59364 Should move to the OK state after performing a successfu…
SERVER-60587 Implement FaultFacet and make necessary changes in Healt…
SERVER-59370: Should Transition to ActiveFault state when in the Tran…
SERVER-59396 Adds server parameter healthMonitoring
SERVER-59366 Progress monitor for periodic health check
SERVER-61071 Removes all instances of HealthObserverIntensity
SERVER-61073 fix getParameter on healthMonitoring
SERVER-61368 SERVER-61315 Ldap health check executor should support a…
SERVER-59373 adds new server status section
SERVER-61368 fix link error on RHEL 8.0 Shared Library (No SSL)
SERVER-59365 new state machine implementation for FaultManager
SERVER-59365 fix ASAN link error
SERVER-61438 fix race in FaultManagerTest
SERVER-61872 Fixed thread pool starvation in FaultManager
SERVER-61871 use tassert for state machine programmer errors
SERVER-61921 fix link error in noSSL mode
SERVER-59365: Use the new state machine.
SERVER-59397 Add jitter when scheduling next health check
SERVER-61956 fix data race when accessing the state machine's state
SERVER-61914: add fault facet details to FaultImpl::toBSON
SERVER-59382: Enforce non-critical facets not entering ActiveFault state
SERVER-61873 add configurable health observer parameters
SERVER-61220 Integration test for progress monitor
SERVER-62096 test should not rely on /smaps, reduce verbosity
SERVER-62084 unify FaultFacetType serialization implementations
SERVER-62098: Guard access to healthCheckContexts with a mutex
SERVER-61930: Individual health observers should return an error if a
SERVER-59368 runtime change of intensities values
SERVER-60944 Simplify Fault class hierarchy and interface for updatin…
SERVER-62188 fix memory corruption in the DeadlineFuture
SERVER-62197: Get rid of potential deadlock.
SERVER-62203: rename thread name
SERVER-62202 add observer type as string to log 5936504
SERVER-62204 do not schedule health check if observer is not enabled
SERVER-62174 Refactored health check intervals
SERVER-58153 Enable Feature flag for Remove Faulty Mongos From Cluste…
SERVER-60846 replace double severity with enum type
SERVER-62357 Increase the default health check progress monitor interval
SERVER-62378 Remove improperly merged lines from unit test
SERVER-62404: Simplify mutex locking in fault_manager.cpp
SERVER-62321: Increase kActiveFaultDuration for OneFacetIsResolved test.
SERVER-59375 SERVER-62373 additional serverStatus sections for health…
SERVER-62465: After intensities are updated, the resulting health che…
SERVER-63110 manually fixed incompatibilities caused by backport from…
SERVER-59391 fault if LDAP facets are enabled but misconfigured
SERVER-62312 health monitoring documentation
SERVER-62904: Fault Manager progress checker should not fault unless …

Ticket list for the Enterprise module:

SERVER-59912 Initial scaffolding and self registration of the Ldap health checker
SERVER-60084 Fix clang related compile failure in Enterprise Ldap
SERVER-60079 Make health checkers asynchronous, block check until previous is done
SERVER-59366 Progress monitor for periodic health check
SERVER-59386 Ldap health checker
SERVER-61368 SERVER-61315 Ldap health check executor should support aborted tasks; test refactorings
SERVER-61220 Ldap health checker linked into mongos
SERVER-60846 replace double severity with enum type
SERVER-59391 fault if LDAP facets are enabled but misconfigured
SERVER-63110 reconsiled differences from head during 5.0 backport



 Comments   
Comment by Githook User [ 01/Feb/22 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-63110 manually reconcile minor test discrepancies made during backport
Branch: v5.0
https://github.com/mongodb/mongo/commit/fac801f0e5e27389b9d2acb4a106dafa0b0101ea

Comment by Githook User [ 01/Feb/22 ]

Author:

{'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}

Message: SERVER-63110 reconsiled differences from head during 5.0 backport
Branch: v5.0
https://github.com/10gen/mongo-enterprise-modules/commit/57e9cc74b8dc43b53aa708a2ed709c0ef0acf2a0

Comment by Githook User [ 29/Jan/22 ]

Author:

{'name': 'Kshitij Gupta', 'email': 'kshitij.gupta@mongodb.com', 'username': 'kshitijng'}

Message: SERVER-63110 health monitoring backport to 5.0 branch 1
Branch: v5.0
https://github.com/mongodb/mongo/commit/103d8d2ca024c684a2b0dadda16ac7647899bb39

Comment by Andrew Shuvalov (Inactive) [ 28/Jan/22 ]

max.hirschhorn yes. The 4.4 backport was done by squashing about 6-7 commits for every batch. The cutoff between them was not around the bugfixes but when the next commit would create many merge conflicts and it was easy to commit something clean and edit the code manually on top of it. This time there won't be any merge conflicts so it will be easier to squash much larger blocks and cut off at bugfixes.

Comment by Max Hirschhorn [ 28/Jan/22 ]

Do we plan to squash some of the commits from the project together to avoid introducing red boxes into Evergreen? The backport to the 4.4 branch had led to a number of BFs being created only to be closed because the fix was another commit which had later been backported.

Generated at Thu Feb 08 05:56:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.