Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Fixed
-
None
-
Fully Compatible
-
ALL
-
Sharding 2021-11-01
-
31
-
1
Description
The random_moveChunk_index_operations is a test that is a part of the concurrency suite. So it runs multiple threads that are memory isolated, each of which execute the state machine defined in the test file.
Each thread is given its own namespace to utilize for their operations. However, one of the steps in the state machine, moveChunk, randomly gets the namespace of another thread. Then it executes the moveChunk command on the. namespace of that other thread.
It is important to note that in this test there are only ever two chunks. So a chunk move means that either a shard now has 0 chunks or that it previously had 0 chunks.
This means that the following order of operations is possible:
- Thread 1 starts building an index on shard1 when it owns a chunk for that collection L-48215
- Thread 3 moves a chunk away from shard1 to shard0 L-48234
- Thread 3 attempts to move a chunk from shard0 to shard1. L-56375
- Thread 3's moveChunk command fails due to BackgroundOperationInProgressForNamespace since there are no chunks on shard1 for that collection. L-56383
- That failure happened because since if a shard doesn't own chunks for a collection. We attempt drop the local indexes when doing the moveChunk. But since the creation of the indexes is still in progress, the attempt to drop fails with the error code BackgroundOperationInProgressForNamespace.
- The index Thread 1 wanted to build is done building. L-56637
Proposed Solution
In the current test there are already a set of errors that the moveChunk function ignores. Namely errors due to interruption, duplicate keys amongst others. We should add the BackgroundOperationInProgressForNamespace error to that list of errors we ignore.
Linked Logs
[fsm_workload_test:random_moveChunk_index_operations] Fixture status:
|
...
|
[j0:s1:n2] | 2021-08-27T17:57:59.415-07:00 I INDEX 20384 [IndexBuildsCoordinatorMongod-4] "Index build: starting","attr":{"buildUUID":{"uuid":{"$uuid":"e46df392-ac2b-4e84-a088-f4edb9aadde7"}},"collectionUUID":{"uuid":{"$uuid":"687ead0d-808f-4be0-8004-264bb9f0b15f"}},"namespace":"test15_fsmdb0.fsmcoll01","properties":{"v":2,"key":{"c":1},"name":"c","expireAfterSeconds":10000},"specIndex":0,"numSpecs":1,"method":"Hybrid","maxTemporaryMemoryUsageMB":200}
|
...
|
[j0:s1:n2] | 2021-08-27T17:57:59.485-07:00 I MIGRATE 22016 [MoveChunk] "Starting chunk migration donation","attr":{"requestParameters":"ns: test15_fsmdb0.fsmcoll01, [{ _id: 50.0 }, { _id: MaxKey }), fromShard: shard-rs1, toShard: shard-rs0","collectionEpoch":{"$oid":"612989a5e91db2538ebfd200"}}
|
...
|
[j0:s1:n2] | 2021-08-27T17:58:14.116-07:00 I MIGRATE 22000 [migrateThread] "Starting receiving end of chunk migration","attr":{"chunkMin":{"_id":50},"chunkMax":{"_id":{"$maxKey":1}},"namespace":"test15_fsmdb0.fsmcoll01","fromShard":"shard-rs0","epoch":{"$oid":"612989a5e91db2538ebfd200"},"sessionId":"shard-rs0_shard-rs1_61298a252dc3ba271df91220","migrationId":{"uuid":{"$uuid":"475828d1-9359-493c-9b35-c4e43026e939"}}}
|
...
|
[j0:s1:n2] | 2021-08-27T17:58:14.281-07:00 I MIGRATE 21998 [migrateThread] "Error during migration","attr":{"error":"migrate failed: BackgroundOperationInProgressForNamespace: cannot perform operation: an index build is currently running for collection with UUID: 687ead0d-808f-4be0-8004-264bb9f0b15f"}
|
...
|
[j0:s1:n2] | 2021-08-27T17:58:15.148-07:00 I INDEX 20345 [IndexBuildsCoordinatorMongod-4] "Index build: done building","attr":{"buildUUID":{"uuid":{"$uuid":"e46df392-ac2b-4e84-a088-f4edb9aadde7"}},"namespace":"test15_fsmdb0.fsmcoll01","index":"c","commitTimestamp":{"$timestamp":{"t":1630112295,"i":3}}}
|
...
|
[j0:s1:n2] | 2021-08-27T17:58:21.046-07:00 I INDEX 20384 [ReplWriterWorker-23] "Index build: starting","attr":{"buildUUID":null,"collectionUUID":{"uuid":{"$uuid":"687ead0d-808f-4be0-8004-264bb9f0b15f"}},"namespace":"test15_fsmdb0.fsmcoll01","properties":{"v":2,"key":{"a":1},"name":"a","expireAfterSeconds":10002},"specIndex":0,"numSpecs":1,"method":"Hybrid","maxTemporaryMemoryUsageMB":200}
|
...
|
[fsm_workload_test:random_moveChunk_index_operations] failed to load: jstests/concurrency/fsm_libs/resmoke_runner.js
|
...
|
|