-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Server Security
-
ALL
-
Server Security 2026-03-27, Server Security 2026-04-24
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Context
SERVER-121197 wrapped FLE transaction tests with the withAbortAndRetryOnTransientTxnError retry helper to make them robust against transient transaction errors in sharded passthrough suites.
However, 4 FLE tests fail in the bulk_write_sharded_collections_jscore_passthrough suite because the bulkWrite command does not attach
errorLabels: ["TransientTransactionError"]
on responses when FLE internal transaction processing fails with a transient error (e.g., StaleConfig).
These tests cannot be fixed by the retry helper alone because errorLabels are missing from the response:
- src/mongo/db/modules/enterprise/jstests/fle2/query/find_getMore_txn.js
- src/mongo/db/modules/enterprise/jstests/fle2/query/find_txn.js
- src/mongo/db/modules/enterprise/jstests/fle2/txn_insert.js
- src/mongo/db/modules/enterprise/jstests/fle2/txn_insert_range.js
Root Cause
When FLE processing for a bulkWrite command throws an exception (e.g., StaleConfig from an internal transaction), the attemptExecuteFLE function in fle.cpp catches the exception and converts it into a write error reply item returned in the cursor.
{ "ok": 1, "cursor": { "firstBatch": [{ "idx": 0, "ok": 0, "code": 13388, ... }] } }
The server's command framework only adds errorLabels for top-level command errors. Since the bulkWrite command itself succeeds, the framework skips adding labels.
Reproduce
python buildscripts/resmoke.py run \ --storageEngine=wiredTiger \ --storageEngineCacheSizeGB=0.5 \ --mongodSetParameters='{logComponentVerbosity: {verbosity: 0}}' \ --mongosSetParameters='{logComponentVerbosity: {verbosity: 0}}' \ --jobs=1 \ --installDir bazel-bin/install-dist-test/bin \ --suite=bulk_write_sharded_collections_jscore_passthrough \ src/mongo/db/modules/enterprise/jstests/fle2/query/find_getMore_txn.js \ --repeatTests 10
You can notice the error will report StaleConfig but no errorLabels = ["TransientTransactionError”] so even retrying will not work, not even by just retrying via withAbortAndRetryOnTransientTxnError
Suggested Fix
Given service_entry_point_shard_role.cpp already handles this case, transient transaction error could just be forwarded upstream back to the entry point where we have already the logic to handle them.
Important: Note that secondary client-side fix to bulk_api.js is also needed: mergeBatchResults, WriteResult, and BulkWriteResult should propagate errorLabels as well, which they currently don’t.
Address the TODOs associated with this ticket
- is related to
-
SERVER-121197 FLE transaction tests should use transaction retry helper (part 2)
-
- Closed
-