[SERVER-58298] Reconcile resharding collection drop behavior when running under storage engines that don't support readConcern majority Created: 06/Jul/21  Updated: 29/Jul/21  Resolved: 29/Jul/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: PM-234-M3, PM-234-T-autocommits
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-58603 ensureTempReshardingCollectionExistsW... Closed
Participants:
Linked BF Score: 26
Story Points: 1

 Description   

Resharding is not supported with storage engines that don't support readConcern majority. This manifests in this ticket with dropCollection behavior. In storage engines that don't support readConcern majority, the two-phase dropCollection is run in separate storage transactions. This allows readers to see resharding collections in a quasi-renamed-but-not-yet-dropped state.

These are potential solutions:

  • Fix behavior in resharding server code to tighten constraints around dropCollection. Should have no behavior change on storage engines that support the post-4.0 drop behavior (aka WiredTiger).
  • Add constraints around dropCollection just in resharding unit tests in order to prevent this race from happening.
  • Remove resharding unit tests that rely on dropCollection behavior and rely solely on integration tests to test this.


 Comments   
Comment by Max Hirschhorn [ 21/Jul/21 ]

The linked failure appears to result from the sequence of operations described in SERVER-58603. I'm hoping the Evergreen link from haley.connelly can help clarify under what situation did we actually see the drop-pending behavior that led to the server crash.

Edit: Haley linked me to https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_80_64_bit_dynamic_required_run_unittests_with_recording_patch_12c7f0f5198a03815ad152fd2f693eb08515cb2b_60f091049ccd4e37b806d771_21_07_15_19_48_47/tests?execution=0&sortBy=STATUS&sortDir=ASC so this is actually only a C++ test only issue.

Comment by Blake Oler [ 06/Jul/21 ]

Fixed the description/problem scope.

Comment by Max Hirschhorn [ 06/Jul/21 ]

Is this ticket proposing to run the resharding's C++ tests with the WiredTiger storage engine rather than the EphemeralForTest storage engine? Note that the C++ unit tests do not inherit the storage engine mentioned in the build variant name.

Generated at Thu Feb 08 05:44:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.