Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25652

Slow chunk migrations when there are large chunk counts. 3.0, 3.2, 3.3.11

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.0.8, 3.3.11
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide
      • Create two-shard cluster;
      • Stop the balancer;
      • Create an empty sharded collection and split it into 500,000 chunks.
        (100k chunks may be enough to make noticeable differences, but it has been at 500k that it becomes very easy to observe.)
      • Start the balancer to and observe the time taken to move each chunk.
      Show
      Create two-shard cluster; Stop the balancer; Create an empty sharded collection and split it into 500,000 chunks. (100k chunks may be enough to make noticeable differences, but it has been at 500k that it becomes very easy to observe.) Start the balancer to and observe the time taken to move each chunk.
    • Sprint:
      Sharding 2016-09-19, Sharding 2016-10-10, Sharding 2016-10-31

      Description

      I've been testing the speed of chunk migrations in an all-on-one-server test cluster. Even when the chunks being migrated are empty (i.e. the chunk move takes only ~0.1 secs) the entire cycle run by the balancer takes a lot longer.

      version balance round time
      3.0.8 ~4.5 secs
      v3.3.11-30-gc96009e ~ 9 secs

      From someone's else case with v3.2 and different servers / network to my test I heard of a ~6 second cycle. Not sure if that was a replica set config db or the older SCCC-style one.

      Can the balancer be changed so that the balance round will do multiple chunks of each collection so long as they finish quickly? E.g. balance round identifies candidate chunks for migrations, and keeps on doing chunk moves for them serially until a, say, 10 sec window completes.

      At any rate if data has been completely deleted for a big fraction of chunk ranges before adding a new shard, it would be good if those chunks moves happened a lot more quickly.

        Attachments

        1. chunksOnWrongTier_v2.js
          1 kB
        2. createChunks_v3.js
          3 kB

          Issue Links

            Activity

              People

              Assignee:
              kaloian.manassiev Kaloian Manassiev
              Reporter:
              akira.kurogane Akira Kurogane
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: