[SERVER-63855] Make dbCheck work with resharding Created: 18/Feb/22  Updated: 24/May/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Moustafa Maher Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 0
Labels: pm-855-quick-win
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repro.js    
Issue Links:
Related
related to SERVER-66046 Resharding coordinator won't automati... Closed
related to SERVER-62578 Add fsm workload for collectionUUID c... Closed
related to SERVER-66011 Enable internal_transactions_reshardi... Closed
related to SERVER-30846 Run dbCheck as background workload in... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

We need to resolve this TODO.

Example of running collection_uuid_sharded.js

For local repro please run this test: repro.js

In addition to making dbCheck work with resharding, we should re-enable dbCheck as background workload in FSM tests.



 Comments   
Comment by Max Hirschhorn [ 28/Apr/22 ]

m.maher@mongodb.com, josef.ahmad@mongodb.com, louis.williams@mongodb.com, I'm a little confused by the state the RunDBCheckInBackground hook was left in for sharded clusters. The RunDBCheckInBackground hook was left enabled in the concurrency_sharded_multi_stmt_txn_with_balancer.yml test suite because it happened to pass. I suspect this is because the resharding operations in the reshard_collection_crud_ops.js FSM workload are too quick to complete for them to overlap with a dbCheck command being run by the background thread and the collection_uuid_sharded.js cannot run in the the concurrency_sharded_multi_stmt_txn_with_balancer.yml test suite.

  • I think the fix for supporting dbCheck on the source collection while resharding is running is straightforward. OplogEntry::CommandType::kDbCheck should be added to the list of allowable commands during a resharding operation in ReshardingOplogBatchPreparer::throwIfUnsupportedCommandOp(). There is no work for the recipient shards to do upon seeing a dbCheck oplog entry because dbCheck doesn't modify the user collection data.
  • Attempting to add a new FSM workload which runs a longer-running resharding operation in the concurrency_sharded_multi_stmt_txn_with_balancer.yml test suite now fails (e.g. SERVER-66011). Needing to disable an FSM workload for testing a released feature due to dbCheck seems like an inversion of importance.

I'd either like to (a) have dbCheck be supported during the resharding operation or (b) disable the RunDBCheckInBackground hook in the concurrency_sharded_multi_stmt_txn_with_balancer.yml test suite as well.

Side note: The reason for the Evergreen task timing out is due to a separate bug where the resharding coordinator doesn't realize one of the recipient participant shards had errored. I filed SERVER-66046 for addressing that issue. It won't affect SERVER-63855 because the recipient shard is only erroring during its applying phase because it sees a dbCheck oplog entry.

Generated at Thu Feb 08 05:58:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.