Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Declined
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- node-internal

Story Points:
1
Compass/DevTools Changes:
Not Needed
Confidence Status:
None

Documentation Changes Summary:

Hide

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?

Show
1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Use Case

As a... user running retryable operations on sharded clusters
I want... server selection timeouts to be properly cleaned up after retries
So that... my application does not crash

User Experience

When a retryable write (or read) fails with a transient error inside a transaction on a sharded cluster, the driver correctly retries and the operation succeeds. However, ~30 seconds later, an unhandled promise rejection fires with TimeoutError: Expired after 30000ms crashes the process.

This affects any user on a sharded cluster using explicit transactions who encounters a transient retryable error. The error itself is handled correctly (retry succeeds), but the leaked timeout causes a delayed crash.

It's been discovered during Client Backpressure implementation (~~NODE-7142~~) but is independent of it, the only reason CB surfaced it because of 5 retries in longer timeouts (backoff + jitter), but the problem exists on main as well.

Dependencies

none

Risks/Unknowns

none

Acceptance Criteria

Implementation Requirements

change

if (options.timeoutContext?.clearServerSelectionTimeout) timeout?.clear();

to:

if (!options.timeoutContext || options.timeoutContext.clearServerSelectionTimeout) {
  timeout?.clear();
}

The logic: clear the timeout if (a) no timeoutContext was provided (we created a local timeout), or (b) the timeoutContext explicitly says to clear it (Legacy path). Do not clear when CSOT owns the timeout (clearServerSelectionTimeout === false).

Testing Requirements

Add a regression test that exercises the retry path on a sharded cluster with a pinned transaction session, verifying no unhandled rejections occur after the operation completes. This can be done by:

Starting a transaction on a sharded cluster
Using failCommand to trigger a retryable error on a write inside the transaction
Verifying the operation succeeds (retry works)
Waiting >30 seconds (or using a shorter serverSelectionTimeoutMS) and asserting no unhandled rejection fires

Documentation Requirements

DOCSP ticket, API docs, etc

Follow Up Requirements

additional tickets to file, required releases, etc
if node behavior differs/will differ from other drivers, confirm with dbx devs what standard to aim for and what plan, if any, exists to reconcile the diverging behavior moving forward

blocks

NODE-7142 Exponential backoff and jitter in retry loops

Closed

Assignee:: Sergey Zelenov
Reporter:: Sergey Zelenov
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Feb 20 2026 09:12:58 AM UTC
Updated:: Feb 20 2026 10:14:49 AM UTC
Resolved:: Feb 20 2026 10:14:50 AM UTC

Details

Description

Use Case

User Experience

Dependencies

Risks/Unknowns

Acceptance Criteria

Implementation Requirements

Testing Requirements

Documentation Requirements

Follow Up Requirements

Attachments

Issue Links

Activity

People

Dates