[SERVER-61108] ReshardingCoordinatorService, Config Collection Deletes Can Time Out Waiting for Replication on Coordinator Doc, Leading to Fatal Assertion Created: 29/Oct/21  Updated: 29/Oct/23  Resolved: 29/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 5.1.0-rc2
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc3

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Luis Osta (Inactive)
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Sharding 2021-11-01
Participants:
Story Points: 1

 Description   

Same as SERVER-61052, we need to create a kWriteConcernMajority constant for the ReshardingCoordinatorService



 Comments   
Comment by Githook User [ 01/Nov/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61108 Remove wtimeout from resharding coordinator write concern

(cherry picked from commit b2531ed72eb81c7a9e4951e4aab93c7d190d3023)
Branch: v5.1
https://github.com/mongodb/mongo/commit/1ea5111a46f570d9b805a504d824c6dddff2462c

Comment by Githook User [ 01/Nov/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61108 Remove wtimeout from resharding coordinator write concern

(cherry picked from commit b2531ed72eb81c7a9e4951e4aab93c7d190d3023)
Branch: v5.0
https://github.com/mongodb/mongo/commit/91e154e718db6f1f4031e3a279a007dfaf063429

Comment by Githook User [ 29/Oct/21 ]

Author:

{'name': 'Luis Osta', 'email': 'luis.osta@mongodb.com', 'username': 'LuisOsta'}

Message: SERVER-61108 Remove wtimeout from resharding coordinator write concern
Branch: master
https://github.com/mongodb/mongo/commit/b2531ed72eb81c7a9e4951e4aab93c7d190d3023

Comment by Max Hirschhorn [ 29/Oct/21 ]

Also it may be worth noting it isn't required (nor really encouraged) that ReshardingCoordinator uses "majority" write concern for its local writes. PrimaryOnlyService guarantees all of the writes done by a particular Instance occur within a single term. Using {w: 1} and having writes follow writes would be sufficient so long as the ReshardingCoordinator waits for the changes to be majority-committed before attempting to contact the remote shards (e.g. when communicating its decision).

We don't allow a cluster to run multiple resharding operations so blocking the thread synchronously isn't a big deal. I'd therefore rather leave the structure of things in the ReshardingCoordinator as-is. We can look to shift the ReshardingCoordinator over to using kNoWaitWriteConcern and WaitForMajorityService::waitUntilMajority() as part of some later refactoring. The risk to value tradeoff isn't good enough to warrant doing it now.

Comment by Max Hirschhorn [ 29/Oct/21 ]

We'll also want to change makeFlushRoutingTableCacheUpdatesCmd() because CommandHelpers::appendMajorityWriteConcern() adds a wtimeout of 60 seconds by default.

Generated at Thu Feb 08 05:51:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.