[SERVER-34770] Retry on JavaScript execution interruptions in stepdown suites Created: 01/May/18  Updated: 29/Oct/23  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: Sharding, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.6.10, 4.0.5, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-32565 Stepdown suites should tolerate js en... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Sharding 2018-10-08
Participants:
Linked BF Score: 20

 Description   

When a primary steps down, it interrupts all ongoing user operations. If a command is currently executing JavaScript (like an eval or find with $where), its MozJS scope will be marked as killed and return ErrorCodes::Interrupted. This is expected / desired behavior for a stepdown, so the stepdown passthroughs should be resilient to this error code / message combination. The auto_retry_on_network_error.js override should be updated to retry on this response.

This is separate from SERVER-32565, which tracks adding retries on JS engine internal errors, which may not be desired behavior.



 Comments   
Comment by Githook User [ 20/Nov/18 ]

Author:

{'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}

Message: SERVER-34770 Retry on JavaScript execution interruptions in stepdown suites

(cherry picked from commit bb2de3700ee5b8eec9aa51cdbd2ecec937480c6c)
Branch: v3.6
https://github.com/mongodb/mongo/commit/ae4011cf301665046c435c18fd6ef088f5881e04

Comment by Githook User [ 12/Nov/18 ]

Author:

{'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}

Message: SERVER-34770 Retry on JavaScript execution interruptions in stepdown suites

(cherry picked from commit bb2de3700ee5b8eec9aa51cdbd2ecec937480c6c)
Branch: v4.0
https://github.com/mongodb/mongo/commit/4e50d2df115b46f9cba86104b2865a5827035769

Comment by Jack Mulrow [ 02/Nov/18 ]

The BF this ticket fixed can occur on 3.6 and 4.0 as well, so requesting a backport to those branches.

Comment by Jack Mulrow [ 14/Sep/18 ]

Now that the JS scope returns the error code it was interrupted with, if a stepdown interrupts JS execution, the shell should choose to retry using the existing logic that retries on stepdown errors. So I think this ticket can be resolved.

Comment by Benjamin Caimano (Inactive) [ 17/Aug/18 ]

Handing over to sharding.

Comment by Githook User [ 09/Jun/18 ]

Author:

{'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}

Message: SERVER-34770 Retry on JavaScript execution interruptions in stepdown suites
Branch: master
https://github.com/mongodb/mongo/commit/bb2de3700ee5b8eec9aa51cdbd2ecec937480c6c

Comment by Benjamin Caimano (Inactive) [ 24/May/18 ]

On first glance, I believe that the repl coordinator actually does mark it as ErrorCodes::InterruptedDueToReplStateChange. However, there is a (somewhat reasonable) short circuit in the MozJS scope that just returns interrupted instead of the true code for all manner of kills. jack.mulrow, did you have a favorite repro strategy for the BF?

Comment by Kaloian Manassiev [ 11/May/18 ]

We intentionally do not retry on ErrorCodes::Interrupted, because it is supposed to mean that an user killed the task. It seems to me that on stepdown interruption, the MozJS scope should be marked as InterruptedDueToReplSetStepdown (which will be retried).

Unless I am mistaken - we should pass this on to the Platforms team. Not sure whether they should do that or Repl.

Generated at Thu Feb 08 04:37:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.