Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Unresolved
Priority: Unknown
Fix Version/s: None
Component/s: Retryability
Labels:
- spec-fest

Driver Changes:
Needed

Investigate AI-agent analysis report on the retryable-writes specification. This ticket tracks findings identified during a systematic review of the spec: missing test coverage, ambiguous normative language, and spec/test inconsistencies.

Missing Tests

Error code 20 with errmsg starting with "Transaction numbers" MUST produce an actionable error message — prose
test only, no YAML
PoolClearedError during connection checkout MUST get RetryableWriteError label — prose test 2 only, no unified
test (PR #1804 /
DRIVERS-1815 implementing)
WriteConcernError with RetryableWriteError label + NoWritesPerformed on retry: MUST return original error —
prose test 3 only
Sharded cluster: retry MUST target a different mongos (deprioritization) — prose tests 4 & 5 only; not expressible
in YAML
Write commands inside transactions MUST NOT be retried and MUST NOT get RetryableWriteError labels
Network error during initial connection handshake MUST get RetryableWriteError label — handshakeError.yml
exists but completeness unclear
CSOT-enabled retries: multiple attempts MUST be allowed (vs. single attempt without CSOT) — no CSOT-specific retry
count tests (DRIVERS-3247 ready for work)

Ambiguities

Server selection deprioritization: "Failed server MUST be passed as deprioritized" — does this exclude it, rank it
lower, or merely mark it? Prose test 4 acknowledges this cannot be reliably tested without external tools.
bulkWrite eligibility granularity: Eligibility evaluated "after order and batch splitting individually" — if
UpdateMany and InsertOne are in different batches, is each batch evaluated independently?
mongod vs. mongos error detection: writeConcernError codes are only valid on mongod responses — how do drivers
distinguish in a sharded cluster where mongos forwards shard errors?

Inconsistencies

updateMany.yml and deleteMany.yml: Both say "ignores retryWrites" and verify no txnNumber. The spec language
("MUST NOT add a transaction ID") is a firm enforcement requirement, not a behavioral choice — naming is misleading.
PoolClearedError label requirement vs. test coverage: Spec says MUST add RetryableWriteError to pool errors,
but this is driver-level (from CMAP), not server-level — the distinction in when to apply the label is not
consistently tested.
operationId guidance: Commands in a retry SHOULD share operationId but drivers SHOULD NOT use operationId to
relay transaction ID info — somewhat contradictory for multi-command bulk writes.

Notes

35 unified test files with good coverage of single/multi-statement operations, error labels, and server errors.
5 prose tests (PoolClearedError, WriteConcernError, sharded cluster deprioritization) — ~12% of critical coverage is
manual.
CSOT behavioral change (2022-10-18: "multiple retries allowed when CSOT enabled") has no corresponding test coverage.
DRIVERS-3296 (Backlog): clarify expected behavior for command logging,
retryable writes and write concern errors.
DRIVERS-3352 (Backlog): add RetryableError labels to retryability
eligibility.

is related to

DRIVERS-3296 Clarify expected behavior for command logging, retryable writes and write concern errors

Backlog

DRIVERS-3352 Add RetryableError labels to retryability eligibility for retryable reads/writes

Backlog

DRIVERS-3247 Improve TimeOutException handling in retryable reads/writes

Ready for Work

related to

DRIVERS-3484 Spec gap analysis: missing tests, ambiguities, and inconsistencies across all 42 components

Closed

Assignee:: Unassigned
Reporter:: Jérôme Tamarelle
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: May 19 2026 12:35:31 PM UTC
Updated:: May 19 2026 03:07:49 PM UTC

Details

Description

Missing Tests

Ambiguities

Inconsistencies

Notes

Attachments

Issue Links

Activity

People

Dates