Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 7.1.0-rc0
Affects Version/s: 4.4.0, 5.0.0, 6.0.0, 6.3.0-rc0
Component/s: Sharding
Labels:
- sharding-nyc-subteam2

Assigned Teams:

Sharding NYC
Backwards Compatibility:
Minor Change
Operating System:
ALL
Linked BF Score:
5
Story Points:
3
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

All TransactionCoordinators from the previous term when the node was primary must have exited before a node can finish stepping up as primary. The mechanisms for interrupting TransactionCoordinators involves interrupting active OperationContext and shutting down the txn::AsyncWorkScheduler's TaskExecutor. However the TransactionCoordinator also waits through the WaitForMajorityService and isn't guaranteed to be interrupted. This results in the node completing its member state PRIMARY transition but being unable to exit "drain mode" where the node can accepts writes as primary.

One visible symptom of this behavior is for the following message to be logged every 5 seconds.

[js_test:txn_two_phase_commit_basic] d20040| {"t":{"$date":"2023-02-10T22:40:06.714+00:00"},"s":"I",  "c":"TXN",      "id":22442,   "ctx":"OplogApplier-0","msg":"After 5 seconds of wait there are still sessions left with active coordinators which have not yet completed","attr":{"numSessionsLeft":1}}

causes

SERVER-103841 Memory leak in TransactionCoordinator associated to long-lived cancellation source

Closed

Assignee:: David Chen
Reporter:: Max Hirschhorn
Participants:: David Chen, Githook User, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Feb 11 2023 12:04:52 AM UTC
Updated:: Jul 17 2025 09:22:21 PM UTC
Resolved:: Aug 02 2023 07:22:39 PM UTC
Confidence Status Last Update:: 14/Jun/23 2:01 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates