Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61808

The cloning of sessions of the moveChunk gets slower after some migrations

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.1.0
    • Component/s: Sharding
    • Sharding 2021-12-13, Sharding 2021-12-27

      As part of PM-2423 we have been measuring the performance of the migration protocol. We saw some weird numbers related to the cloning of the sessions that would be interesting to understand.

      Environment
      3-shard Sharded cluster running 5.1 binaries.

      Experiment
      We create a sharded collection with an initial pre-split of 1K chunks. The shard key is a hashed random number. After that we insert as bulk of 1K documents using retryable writes.

      After that we execute a few thousand random migrations. There are not CRUD operations during the execution of this phase.

      You can check the Genny worload we executed here.

      Results
      You can find our results here.

      We are plotting two different variables:

      • the total execution time spent holding the critical section during the migration (catch-up phase of the migration).
      • the total execution time spent holding the critical section blocking reads and writes (commit phase of the migration).

      The interesting time is the first one, the second one is more or less constant. We can see an slow down of 30x between the first move chunks and the lasts ones on my machine, after 4K moveChunks. We also got some numbers on EVG, you see them on the different tabs.

            Assignee:
            luis.osta@mongodb.com Luis Osta (Inactive)
            Reporter:
            sergi.mateo-bellido@mongodb.com Sergi Mateo Bellido
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: