[SERVER-62175] Mongos fails to attach RetryableWrite Error Label For Command Interrupted In _parseCommand Created: 17/Dec/21 Updated: 29/Oct/23 Resolved: 10/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 5.3.0 |
| Fix Version/s: | 5.0.9, 4.4.15, 6.0.0-rc6, 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Luis Osta (Inactive) | Assignee: | Rachita Dhawan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v6.0, v5.3, v5.0, v4.4
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | The following resmoke invocation will execute the reproducing test once the git patch has been applied.
The patch can be applied by executing the following command in the root of the repository.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding NYC 2022-04-04, Sharding NYC 2022-04-18, Sharding 2022-05-02, Sharding NYC 2022-05-16 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Story Points: | 3 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This issue was originally discovered in the linked HELP ticket. It was found due to the shutdown that is required as part of version upgrades in Atlas. The fundamental issue is due to how the ServiceEntryPoint logic in MongoS works. When a command fails, it calls getErrorLabels in order to attach the appropriate information to the response. Relevant to our discussion is that it uses the sessionInformation in _osi to determine whether or not to attach the kRetryableWrite label. But the problem is that _osi is emplaced after a call to checkForInterrupt. Which results in the retryable write label not being attached to the response even though it should be. |
| Comments |
| Comment by Githook User [ 17/May/22 ] |
|
Author: {'name': 'Rachita Dhawan', 'email': 'rachita.dhawan@gmail.com', 'username': 'racdhawan'}Message: (cherry picked from commit 38e4a5516d614a2bd1f3c3afde97c19068fd1441) |
| Comment by Githook User [ 11/May/22 ] |
|
Author: {'name': 'Rachita Dhawan', 'email': 'rachita.dhawan@gmail.com', 'username': 'racdhawan'}Message: |
| Comment by Githook User [ 10/May/22 ] |
|
Author: {'name': 'Rachita Dhawan', 'email': 'rachita.dhawan@gmail.com', 'username': 'racdhawan'}Message: backport from commit 38e4a5516d614a2bd1f3c3afde97c19068fd1441 |
| Comment by Githook User [ 04/May/22 ] |
|
Author: {'name': 'Rachita Dhawan', 'email': 'rachita.dhawan@gmail.com', 'username': 'racdhawan'}Message: |
| Comment by Jennifer Huang (Inactive) [ 12/Apr/22 ] |
|
Thanks Lamont, I'll pass that on to the customer. |
| Comment by Lamont Nelson [ 11/Apr/22 ] |
|
Hi jennifer.huang@mongodb.com we are currently starting work on this. We plan to backport the fix to 5.0 and 4.4. We should be able to fix this in master over the coming iteration, and the backports will take a couple days after that. Then the fix will have to be tested through our release candidate process for each version. Does this help? |
| Comment by Blake Oler [ 01/Mar/22 ] |
|
Hey, after looking at this ticket, we've come squarely to the knowledge that Service Architecture doesn't own the underlying logic here. Passing back to Sharding. |
| Comment by Lingzhi Deng [ 01/Mar/22 ] |
|
I think the expected behavior is that mongos should always be able to return the right error labels regardless of the stage a command fails at as it should have all the information it needs from the command to differentiate what the client request is and what error labels to attach. The fact that getErrorLabels label is not called in certain exit path seems a bug to me. And when getErrorLabels does get called, ideally mongos needs to be able to answer reliably these if clauses here and figure out the right error label to return. |
| Comment by Vojislav Stojkovic [ 01/Mar/22 ] |
|
I looked at the code, and it doesn't look like getErrorLabels call that you linked to would be called if checkForInterrupt inside _parseCommand throws. As far as I can tell, if _parseCommand throws, the execution flow wouldn't go through the ParseAndRunCommand::RunInvocation::_tapOnError function at all. Can you clarify what the expected behavior is? |