FLE BulkWrites in transaction do not propagate the TransientTransactionError label

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Server Security
    • ALL
    • Server Security 2026-03-27, Server Security 2026-04-24
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Context
      SERVER-121197 wrapped FLE transaction tests with the withAbortAndRetryOnTransientTxnError retry helper to make them robust against transient transaction errors in sharded passthrough suites.
      However, 4 FLE tests fail in the bulk_write_sharded_collections_jscore_passthrough suite because the bulkWrite command does not attach

       errorLabels: ["TransientTransactionError"]

       on responses when FLE internal transaction processing fails with a transient error (e.g., StaleConfig).

       

      These tests cannot be fixed by the retry helper alone because errorLabels are missing from the response:

      • src/mongo/db/modules/enterprise/jstests/fle2/query/find_getMore_txn.js
      • src/mongo/db/modules/enterprise/jstests/fle2/query/find_txn.js
      • src/mongo/db/modules/enterprise/jstests/fle2/txn_insert.js
      • src/mongo/db/modules/enterprise/jstests/fle2/txn_insert_range.js

      Root Cause

      When FLE processing for a bulkWrite command throws an exception (e.g., StaleConfig from an internal transaction), the attemptExecuteFLE function in fle.cpp catches the exception and converts it into a write error reply item returned in the cursor.

      { "ok": 1, "cursor": { "firstBatch": [{ "idx": 0, "ok": 0, "code": 13388, ... }] } } 

      The server's command framework only adds errorLabels for top-level command errors. Since the bulkWrite command itself succeeds, the framework skips adding labels.

       

      Reproduce

       

      python buildscripts/resmoke.py run \
        --storageEngine=wiredTiger \
        --storageEngineCacheSizeGB=0.5 \
        --mongodSetParameters='{logComponentVerbosity: {verbosity: 0}}' \
        --mongosSetParameters='{logComponentVerbosity: {verbosity: 0}}' \
        --jobs=1 \
        --installDir bazel-bin/install-dist-test/bin \
        --suite=bulk_write_sharded_collections_jscore_passthrough \
        src/mongo/db/modules/enterprise/jstests/fle2/query/find_getMore_txn.js \
        --repeatTests 10
      

       

      You can notice the error will report StaleConfig but no errorLabels = ["TransientTransactionError”] so even retrying will not work, not even by just retrying via withAbortAndRetryOnTransientTxnError

      Suggested Fix

      Given service_entry_point_shard_role.cpp  already handles this case, transient transaction error could just be forwarded upstream back to the entry point where we have already the logic to handle them.

      Important: Note that secondary client-side fix to bulk_api.js is also needed: mergeBatchResults, WriteResult, and BulkWriteResult should propagate errorLabels as well, which they currently don’t.

      Address the TODOs associated with this ticket

            Assignee:
            Erwin Pe
            Reporter:
            Enrico Golfieri
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: