Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59827

random_moveChunk_index_operations fails to ignore BackgroundOperationInProgressForNamespace Error

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 5.2.0
    • Sharding
    • Fully Compatible
    • ALL
    • Sharding 2021-11-01
    • 31
    • 1

    Description

      The random_moveChunk_index_operations is a test that is a part of the concurrency suite. So it runs multiple threads that are memory isolated, each of which execute the state machine defined in the test file.

      Each thread is given its own namespace to utilize for their operations. However, one of the steps in the state machine, moveChunk, randomly gets the namespace of another thread. Then it executes the moveChunk command on the. namespace of that other thread.

      It is important to note that in this test there are only ever two chunks. So a chunk move means that either a shard now has 0 chunks or that it previously had 0 chunks.

      This means that the following order of operations is possible:

      1. Thread 1 starts building an index on shard1 when it owns a chunk for that collection L-48215
      2. Thread 3 moves a chunk away from shard1 to shard0 L-48234
      3. Thread 3 attempts to move a chunk from shard0 to shard1. L-56375
      4. Thread 3's moveChunk command fails due to BackgroundOperationInProgressForNamespace since there are no chunks on shard1 for that collection. L-56383
      5. That failure happened because since if a shard doesn't own chunks for a collection. We attempt drop the local indexes when doing the moveChunk. But since the creation of the indexes is still in progress, the attempt to drop fails with the error code BackgroundOperationInProgressForNamespace.
      6. The index Thread 1 wanted to build is done building. L-56637

      Proposed Solution

      In the current test there are already a set of errors that the moveChunk function ignores. Namely errors due to interruption, duplicate keys amongst others. We should add the BackgroundOperationInProgressForNamespace error to that list of errors we ignore.

      Linked Logs

      [fsm_workload_test:random_moveChunk_index_operations] Fixture status:
      ...
      [j0:s1:n2] | 2021-08-27T17:57:59.415-07:00 I  INDEX    20384   [IndexBuildsCoordinatorMongod-4] "Index build: starting","attr":{"buildUUID":{"uuid":{"$uuid":"e46df392-ac2b-4e84-a088-f4edb9aadde7"}},"collectionUUID":{"uuid":{"$uuid":"687ead0d-808f-4be0-8004-264bb9f0b15f"}},"namespace":"test15_fsmdb0.fsmcoll01","properties":{"v":2,"key":{"c":1},"name":"c","expireAfterSeconds":10000},"specIndex":0,"numSpecs":1,"method":"Hybrid","maxTemporaryMemoryUsageMB":200}
      ...
      [j0:s1:n2] | 2021-08-27T17:57:59.485-07:00 I  MIGRATE  22016   [MoveChunk] "Starting chunk migration donation","attr":{"requestParameters":"ns: test15_fsmdb0.fsmcoll01, [{ _id: 50.0 }, { _id: MaxKey }), fromShard: shard-rs1, toShard: shard-rs0","collectionEpoch":{"$oid":"612989a5e91db2538ebfd200"}}
      ...
      [j0:s1:n2] | 2021-08-27T17:58:14.116-07:00 I  MIGRATE  22000   [migrateThread] "Starting receiving end of chunk migration","attr":{"chunkMin":{"_id":50},"chunkMax":{"_id":{"$maxKey":1}},"namespace":"test15_fsmdb0.fsmcoll01","fromShard":"shard-rs0","epoch":{"$oid":"612989a5e91db2538ebfd200"},"sessionId":"shard-rs0_shard-rs1_61298a252dc3ba271df91220","migrationId":{"uuid":{"$uuid":"475828d1-9359-493c-9b35-c4e43026e939"}}}
      ...
      [j0:s1:n2] | 2021-08-27T17:58:14.281-07:00 I  MIGRATE  21998   [migrateThread] "Error during migration","attr":{"error":"migrate failed: BackgroundOperationInProgressForNamespace: cannot perform operation: an index build is currently running for collection with UUID: 687ead0d-808f-4be0-8004-264bb9f0b15f"}
      ...
      [j0:s1:n2] | 2021-08-27T17:58:15.148-07:00 I  INDEX    20345   [IndexBuildsCoordinatorMongod-4] "Index build: done building","attr":{"buildUUID":{"uuid":{"$uuid":"e46df392-ac2b-4e84-a088-f4edb9aadde7"}},"namespace":"test15_fsmdb0.fsmcoll01","index":"c","commitTimestamp":{"$timestamp":{"t":1630112295,"i":3}}}
      ...
      [j0:s1:n2] | 2021-08-27T17:58:21.046-07:00 I  INDEX    20384   [ReplWriterWorker-23] "Index build: starting","attr":{"buildUUID":null,"collectionUUID":{"uuid":{"$uuid":"687ead0d-808f-4be0-8004-264bb9f0b15f"}},"namespace":"test15_fsmdb0.fsmcoll01","properties":{"v":2,"key":{"a":1},"name":"a","expireAfterSeconds":10002},"specIndex":0,"numSpecs":1,"method":"Hybrid","maxTemporaryMemoryUsageMB":200}
      ...
      [fsm_workload_test:random_moveChunk_index_operations] failed to load: jstests/concurrency/fsm_libs/resmoke_runner.js
      ...
      
      

      Attachments

        Activity

          People

            sanika.phanse@mongodb.com Sanika Phanse
            luis.osta@mongodb.com Luis Osta (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: