[SERVER-56813] Have Resharding be [RRFaM] aware Created: 10/May/21  Updated: 29/Oct/23  Resolved: 13/Sep/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.4, 5.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-59676 DocumentSourceFindAndModifyImageLooku... Closed
depends on SERVER-58060 Add new aggregation stage to downconv... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: Sharding 2021-08-23, Sharding 2021-09-06, Sharding 2021-09-20
Participants:
Story Points: 2

 Description   

Documents being resharded have associated config.transactions updates communicated via standard MDB oplog fetching queries. Without RRFaM, the pre/post images for retryable find and modifies are stored in the oplog themselves and captured by these queries.

With RRFaM, those images are written elsewhere not yet known to resharding. Without any work, resharding while using RRFaM will lose the necessary data for retryable findAndModifies.

It's safe and correct to document that to reshard a collection, a user must first turn off RRFaM on every node in the sharded cluster prior to issuing a resharding command.

For this ticket, we aim to do better and have resharding be aware of the config.image_collection data and copy over the necessary items as needed.



 Comments   
Comment by Githook User [ 07/Oct/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-56813 Have Resharding be [RRFAM] aware

(cherry picked from commit 2400568510487ee3b70c9f1a889e3815e4c47c8e)
Branch: v5.0
https://github.com/mongodb/mongo/commit/aed3fca9c2853b892ac1595cf3c1819d10ddd954

Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 13/Sep/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-56813 Have Resharding be [RRFAM] aware
Branch: master
https://github.com/mongodb/mongo/commit/2400568510487ee3b70c9f1a889e3815e4c47c8e

Comment by Daniel Gottlieb (Inactive) [ 30/Jul/21 ]

As per offline discussion with max.hirschhorn:

Daniel Gottlieb, would doing SERVER-55415 without first doing SERVER-56813 break the resharding_fuzzer tasks on the master branch? Or what is SERVER-55415 planning to do with the storeFindAndModifyImagesInSideCollection server parameter?

Turning RRFAM on with resharding tests before resharding this ticket is completed would presumably result in test failures. The plan for SERVER-55415, if it were to have landed before this ticket:

  • Turns on RRFAM + feature flag by default
  • Expect to see test failures for resharding
  • Have resharding tests/fixtures disable the RRFAM setParameter (but leave the feature flag enabled).

The minimum testing expected of this ticket would then be to turn that setParameter back on for the resharding scenarios.

If this ticket lands before SERVER-55415, it would be nice to try running the resharding tests with the feature flag and setParameter enabled, at least once by hand if not explicitly enabling this for all of resharding. This would give us confidence that SERVER-55415 won't cause resharding tests to fail.

But if doing SERVER-55415 after this ticket did create resharding test failures, I'd probably opt for disabling the combination of (rrfam + resharding) and file another ticket to investigate.

Comment by Max Hirschhorn [ 30/Jul/21 ]

daniel.gottlieb, would doing SERVER-55415 without first doing SERVER-56813 break the resharding_fuzzer tasks on the master branch? Or what is SERVER-55415 planning to do with the storeFindAndModifyImagesInSideCollection server parameter?

I like the idea of having the storeFindAndModifyImagesInSideCollection server parameter toggled on and off. Would probably wait to build that test suite and would instead start with a version which sets storeFindAndModifyImagesInSideCollection to on. It can be modeled off of how the resharding_fuzzer_idempotency.yml test suite was set up to use TestData.setParameters.

Just wanting to see what is the minimum net new test coverage we need for the intersection of the resharding and reduce retryable findAndModify features to know that resharding's oplog fetching pipeline is still working as expected as part of the changes from SERVER-56813.

Comment by Daniel Gottlieb (Inactive) [ 30/Jul/21 ]

The only non-targetted tasks in 5.0 that runs with rrfam on are ones that take advantage of the fuzz configuration. I don't believe would applies to anything resharding runs against.

Now that we actually have a 5.1 FCV, I can continue with SERVER-55415. RRFAM will be on by default in 5.1 testing. So if resharding added no test tuning of rrfam, 5.0 would exercise the old-style and 5.1 the new.

I'll leave max.hirschhorn to decide if having two code paths for findAndModify warrants doubling how much resharding testing we do. It might actually be worthwhile flipping the server parameter while resharding is running. That seems like a good coverage per effort tradeoff while also not adding any new boxes a resharding patch runner needs to check off.

Comment by Jason Chan [ 30/Jul/21 ]

max.hirschhorn We currently only have targeted testing for the reduce retryable findAndModify feature. I wonder if we should add some replica sets and sharding retryable writes suites where this feature is enabled.

cc: daniel.gottlieb, lingzhi.deng

Comment by Max Hirschhorn [ 30/Jul/21 ]

jason.chan, what testing for the reduce retryable findAndModify feature is there already in Evergreen? For example, as part of this ticket, would we need to add a version of the resharding_fuzzer.yml test suite which runs with the featureFlagRetryableFindAndModify and storeFindAndModifyImagesInSideCollection parameters enabled? Or does that configuration already exist on one of the build variants?

Comment by Jason Chan [ 27/Jul/21 ]

After some investigation, the expected work for Resharding is to make use of the new aggregation stage added in SERVER-58060.

Comment by Daniel Gottlieb (Inactive) [ 25/May/21 ]

The anticipated goal for this ticket is to have resharding be aware of new-style retryable writes. Resharding does not plan on maintaining retryable write history for operations that completed prior to resharding starting. Thus, this ticket can be simplified by assuming we only need to track retryable writes that show up in oplog fetching and there's no need to copy the config.image_collection.

Comment by Daniel Gottlieb (Inactive) [ 10/May/21 ]

Putting this into investigating for the 05/31 sprint. Based on those conversations we'll ideally:

  • Know what a code solution looks like
  • Know when we should prioritize typing on this
Generated at Thu Feb 08 05:40:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.