Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.0-rc0, 4.7.0
Affects Version/s: None
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4
Sprint:
Sharding 2020-04-06
Linked BF Score:
14
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

There is no guarantee that the command will always fail with MaxTimeMSExpired because of a race in the AsyncRequestSender. When the opCtx is killed due to the maxTimeMS deadline, the ARS stores the status and marks any outstanding remote responses as failed with MaxTimeMSExpired. However, if the shard has already responded before that, the ARS will still return the response from shard. So in that case, the StaleConfig error from the shard will get handled in the mongos retry loop, causing the mongos to mark the shard as stale and send out setShardVersion to the shard which will then fail due to MaxTimeMSExpired. So the command will just keep failing in all next retries.

The test should instead use checkLog to check for the log line "Failed to refresh metadata for collection.*MaxTimeMSExpired" on the shard.

Assignee:: Cheahuychou Mao
Reporter:: Cheahuychou Mao
Participants:: Cheahuychou Mao, Githook User
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Mar 24 2020 07:27:12 PM UTC
Updated:: Oct 29 2023 10:10:22 PM UTC
Resolved:: Mar 24 2020 08:38:47 PM UTC
Confidence Status Last Update:: 24/Mar/20 7:29 PM

Details

Description

Attachments

Activity

People

Dates