Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56654

Do not use the collection distributed lock for chunk splits

    XMLWordPrintable

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v5.0, v4.4, v4.2, v4.0
    • Sprint:
      Sharding EMEA 2021-06-14

      Description

      Currently, chunk splits, whether manual or initiated by the auto-splitter, acquire the collection distributed lock. This is bad for 2 reasons:

      1. Even if there is a single imbalanced shard, which is performing balancing, the chunk splitter will not be able to acquire the distributed lock and will repeatedly fail
      2. With or without the presence of migrations, the dist lock acquisition still happens after we have performed the splitVector scan in order to determine the split-points, which means that good portion of these scans could end-up being wasted. This point is somewhat mitigated by the fact that the dist lock is taken with the default timeout of 5 seconds, but given that we try with 500 ms back-off there is still some chance that we waste scans on a sufficiently loaded system.

      This ticket is to figure out how to remove the dist lock acquisition from splits without causing an impact in the reverse direction. I.e., splits causing the much more expensive moves to start failing, because the chunk being moved got split.

      We could achieve this by having the split logic ignore size tracking and/or splitting chunks, which are currently being moved.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kaloian.manassiev Kaloian Manassiev
              Reporter:
              kaloian.manassiev Kaloian Manassiev
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: