[SERVER-51696] Find a way to verify writes to various resharding-related collections as part of a POS-driven operation in resharding_coordinator_test.cpp Created: 16/Oct/20  Updated: 06/Dec/22  Resolved: 05/Jun/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: [DO NOT USE] Backlog - Sharding NYC
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-55818 Creating a ReshardingCoordinatorServi... Closed
Related
related to SERVER-57479 Remove resharding_test_util.js Closed
Assigned Teams:
Sharding NYC
Participants:
Story Points: 2

 Comments   
Comment by Max Hirschhorn [ 05/Jun/21 ]

A new ReshardingCoordinatorServiceTest fixture was added under SERVER-55818 and has a ReshardingCoordinator instance transition through all of its states.

Comment by Max Hirschhorn [ 31/Dec/20 ]

Notes for implementer: Max wants a failpoint to hang write before writing down that we've committed, to verify that both the temporary and original have the same data.

Note that these failpoints were already introduced as part of SERVER-51088.


I think it would be good to clarify what this ticket is meant to achieve.

I think the JavaScript testing has reached a sufficient point to say the resharding coordinator triggers all of the state transitions necessary to get all the way through the operation. It seems like it could be useful to have testing which asserts specifically on the mechanics of how those state transitions happen. For example:

  • Each write to config.reshardingOperations is in a multi-document transaction accompanied by writes to config.collections and sometimes also config.tags. Could run reshardCollection command (don't even need any collection data!) and then make assertions on the resulting oplog entries generated by the config server primary. And maybe also make assertions on the resulting the profiler entries for the operations being run by the config server primary.
  • Error handling for performing those local replica set transactions (both transient and non-transient errors). Could claim this is covered by SERVER-53199 with how ReshardingCoordinator does all of its writes with bumpCollShardVersionsAndChangeMetadataInTxn().

One aspect of resharding_coordinator_test.cpp that I had felt could be improved is how the test cases simulate what ReshardingCoordinator would be doing and aren't actually running the primary-only service Instance.

Comment by Haley Connelly [ 03/Nov/20 ]

Note: We spoke offline and determined this won't be ready until MS1 is complete.

Comment by Blake Oler [ 02/Nov/20 ]

Notes for implementer: Max wants a failpoint to hang write before writing down that we've committed, to verify that both the temporary and original have the same data.

Generated at Thu Feb 08 05:26:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.