[SERVER-60949] Config Server step down could lead to a failure while sharding a system collection Created: 22/Oct/21  Updated: 06/Dec/22  Resolved: 07/Jan/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.0 Required
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marcos José Grillo Ramirez Assignee: [DO NOT USE] Backlog - Sharding EMEA
Resolution: Won't Do Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Sharding EMEA
Operating System: ALL
Participants:
Linked BF Score: 8

 Description   

In 4.0 when sharding a collection in the config database, we check if the collection is empty, however, this operation is only checking the status of the runCommand call, but it is not checking the response object (something like getEffectiveStatus does), so the following scenario might happen:

1. The config server send the count command to itself, using the runCommand function
2. The primary steps down
3. The command and it's retries fail, storing the result in the commandStatus field
4. After returning the status check succeeds, trying to access an inexistent field

We could add a similar check like it is done when creating the indexes or simply change the code to use getEffectiveStatus.



 Comments   
Comment by Connie Chen [ 07/Jan/22 ]

Code has been re-written in 5.0 and later, this is also on 4.0, which is a dying branch.

Generated at Thu Feb 08 05:51:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.