[SERVER-49019] refineShardKey can hit getLastError invariant during stepdown Created: 22/Jun/20 Updated: 29/Oct/23 Resolved: 24/Aug/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.4.0-rc10 |
| Fix Version/s: | 4.7.0, 4.4.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Janna Golden |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.4
|
||||||||
| Sprint: | Sharding 2020-09-07 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 10 | ||||||||
| Description |
|
Sequence: 1. refineShardKey finished all updates and about to run commitTransaction. Note: this issue doesn't hit normal transaction because in a normal commitTransaction, the opCtx is fresh and the lock timeout has not been set yet (until the TxnResource is unstashed, which happens after beginOrContinue). In the refineShardKey command, it uses the same AlternativeClientRegion opCtx for all the previous writes in the transaction and the commit command so the opCtx still had the 5ms timeout set from the previous writes. |
| Comments |
| Comment by Githook User [ 08/Sep/20 ] |
|
Author: {'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}Message: (cherry picked from commit 7c8935c12a4d6b6ae6af9a570870450475f8c3e9) |
| Comment by Githook User [ 21/Aug/20 ] |
|
Author: {'name': 'jannaerin', 'email': 'golden.janna@gmail.com', 'username': 'jannaerin'}Message: |
| Comment by Randolph Tan [ 22/Jun/20 ] |
|
Attached a simple test and diff to show where to put sleeps to get more consistent reproduction of this issue. |