Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.7.0, 4.4.2
Affects Version/s: 4.4.0-rc10
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Sprint:
Sharding 2020-09-07
Linked BF Score:
10
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

This invariant

Sequence:

1. refineShardKey finished all updates and about to run commitTransaction.
2. stepDown occurs, stepDown thread enqueues RSTL MODE_X
3. commitTransaction tries to grab RSTL lock inside beginOrContinue and throws LockTimeout error.
4. commitTransaction thread tries to call waitForWriteConcern inside the catch block and encounters NotMaster error.
5. Server tries to create response object and hits invariant in getErrorLabels because LockTimeout is a transient error and the NotMaster writeConcern error is retryable.

Note: this issue doesn't hit normal transaction because in a normal commitTransaction, the opCtx is fresh and the lock timeout has not been set yet (until the TxnResource is unstashed, which happens after beginOrContinue). In the refineShardKey command, it uses the same AlternativeClientRegion opCtx for all the previous writes in the transaction and the commit command so the opCtx still had the 5ms timeout set from the previous writes.

Assignee:: Janna Golden
Reporter:: Randolph Tan
Participants:: Githook User, Janna Golden, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jun 22 2020 06:47:27 PM UTC
Updated:: Oct 29 2023 10:06:40 PM UTC
Resolved:: Aug 24 2020 02:43:15 PM UTC
Confidence Status Last Update:: 12/Aug/20 3:10 PM

Details

Description

Attachments

Forms

Activity

People

Dates