[SERVER-56813] Have Resharding be [RRFaM] aware Created: 10/May/21 Updated: 29/Oct/23 Resolved: 13/Sep/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.4, 5.1.0-rc0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Randolph Tan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Backport Requested: |
v5.0
|
||||||||||||||||
| Sprint: | Sharding 2021-08-23, Sharding 2021-09-06, Sharding 2021-09-20 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Story Points: | 2 | ||||||||||||||||
| Description |
|
Documents being resharded have associated config.transactions updates communicated via standard MDB oplog fetching queries. Without RRFaM, the pre/post images for retryable find and modifies are stored in the oplog themselves and captured by these queries. With RRFaM, those images are written elsewhere not yet known to resharding. Without any work, resharding while using RRFaM will lose the necessary data for retryable findAndModifies. It's safe and correct to document that to reshard a collection, a user must first turn off RRFaM on every node in the sharded cluster prior to issuing a resharding command. For this ticket, we aim to do better and have resharding be aware of the config.image_collection data and copy over the necessary items as needed. |
| Comments |
| Comment by Githook User [ 07/Oct/21 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: (cherry picked from commit 2400568510487ee3b70c9f1a889e3815e4c47c8e) |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 13/Sep/21 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: |
| Comment by Daniel Gottlieb (Inactive) [ 30/Jul/21 ] |
|
As per offline discussion with max.hirschhorn:
Turning RRFAM on with resharding tests before resharding this ticket is completed would presumably result in test failures. The plan for
The minimum testing expected of this ticket would then be to turn that setParameter back on for the resharding scenarios. If this ticket lands before But if doing |
| Comment by Max Hirschhorn [ 30/Jul/21 ] |
|
daniel.gottlieb, would doing I like the idea of having the storeFindAndModifyImagesInSideCollection server parameter toggled on and off. Would probably wait to build that test suite and would instead start with a version which sets storeFindAndModifyImagesInSideCollection to on. It can be modeled off of how the resharding_fuzzer_idempotency.yml test suite was set up to use TestData.setParameters. Just wanting to see what is the minimum net new test coverage we need for the intersection of the resharding and reduce retryable findAndModify features to know that resharding's oplog fetching pipeline is still working as expected as part of the changes from |
| Comment by Daniel Gottlieb (Inactive) [ 30/Jul/21 ] |
|
The only non-targetted tasks in 5.0 that runs with rrfam on are ones that take advantage of the fuzz configuration. I don't believe would applies to anything resharding runs against. Now that we actually have a 5.1 FCV, I can continue with I'll leave max.hirschhorn to decide if having two code paths for findAndModify warrants doubling how much resharding testing we do. It might actually be worthwhile flipping the server parameter while resharding is running. That seems like a good coverage per effort tradeoff while also not adding any new boxes a resharding patch runner needs to check off. |
| Comment by Jason Chan [ 30/Jul/21 ] |
|
max.hirschhorn We currently only have targeted testing for the reduce retryable findAndModify feature. I wonder if we should add some replica sets and sharding retryable writes suites where this feature is enabled. |
| Comment by Max Hirschhorn [ 30/Jul/21 ] |
|
jason.chan, what testing for the reduce retryable findAndModify feature is there already in Evergreen? For example, as part of this ticket, would we need to add a version of the resharding_fuzzer.yml test suite which runs with the featureFlagRetryableFindAndModify and storeFindAndModifyImagesInSideCollection parameters enabled? Or does that configuration already exist on one of the build variants? |
| Comment by Jason Chan [ 27/Jul/21 ] |
|
After some investigation, the expected work for Resharding is to make use of the new aggregation stage added in |
| Comment by Daniel Gottlieb (Inactive) [ 25/May/21 ] |
|
The anticipated goal for this ticket is to have resharding be aware of new-style retryable writes. Resharding does not plan on maintaining retryable write history for operations that completed prior to resharding starting. Thus, this ticket can be simplified by assuming we only need to track retryable writes that show up in oplog fetching and there's no need to copy the config.image_collection. |
| Comment by Daniel Gottlieb (Inactive) [ 10/May/21 ] |
|
Putting this into investigating for the 05/31 sprint. Based on those conversations we'll ideally:
|