Background
Concurrency tests for refining a collection shard key should be added to verify a refine atomically takes effect and does not interfere with concurrent zone operations.
In particular, workloads should be added that:
- Repeatedly refines a shard key with concurrent zone operations (updateZoneKeyRange, addShardToZone, removeShardFromZone) in addition to concurrent CRUD ops.
- Note updateZoneRange is the most important to test, so testing with the other two zone commands can be split out and re-prioritized if necessary.
In addition to the default concurrency suites, these workloads should run in suites with the balancer enabled and failovers. It should be verified that the balancer will actually attempt to move chunks, possibly enforced by inserting enough data to imbalance the cluster, beginning with lopsided zones, and/or adding fsm stages that explicitly wait for a balancer round.
Proposed Solution
High-Level Explanation
This workload will be built upon the underlying idea of refining a list of collections controlled by a latch.
Each thread will have two zones and two zone ranges. The thread will bounce the zones in between two shards, and will bounce the ranges in between the two zones. This means that any given time, a zone will belong to only one shard and will have a specified range. While this is happening, CRUD operations borrowed from the broadcast update/delete transaction states will be operating in documents belonging to the ranges owned by the thread. Every so often, the collection shard key will be refined, and then we will move to the next collection.
Steps for setting up the workload
- Create a number of zones equal to twice the amount of threads, so that each thread may have two zones.
- Create a number of ranges equal to twice the amount of threads, so that each thread may take ownership of two ranges.
- Assign to each thread two zones and two ranges for those zones.
- Shard the collection with the initial zones, which will create the chunks automatically.
- Assign each zone to only one shard. If the balancer is on, wait for the chunks to be distributed.
- Fill owned ranges with documents to be used for CRUD ops.
Steps for thread initialization for the workload.
- Cache into memory each thread's assigned zones, assigned ranges, and assigned documents.
States
- sendZoneToOtherShard() – Picks a random zone assigned to this thread, removes it from the current shard, and assigns it to the other shard. Verifies that the first shard no longer has the zone, and that the second shard has the zone. If the balancer is on, waits for the chunks to be moved.
- swapZoneRange() – Removes the ranges from each of the zones assigned to this thread, and swaps them, such that each range is now assigned to the opposite zone. If the ranges are owned by different shards and the balancer is on, waits for the chunks to be moved.
- refineCollectionShardKey() – Refines the collection's shard key and decreases the latch count for collections to refine.
- update/delete in transaction states – States borrowed from the broadcast update/delete transaction states.