Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-34819

Optimize the sharding balancer's cluster statistics gathering

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Sharding
    • Labels:

      Description

      The sharding balancer currently issues listDatabases against every single shard in order to access the totalSize value. This value is used for ensuring that a shard's storage maxSize is not exceeded for users which have that value set.

      The listDatabases call is quite heavy, especially for nodes with large number of databases/collections since it will fstat every single file under the instance.

      There are a number of optimizations we can make in order to make this statistics gathering less expensive in the presence of maxSize (listed in order of preference):

      • Add a parameter to listDatabases to allow it to return cached data size instead of every time {{fstat}}ing all the files
      • Issue the listDatabases call in parallel against all shards so different shards' execution overlaps
      • Cache the per-shard statistics so that they are not collected on every single round/moveChunk invocation
      • Collect the per-shard statistics asynchronously so that multiple concurrent moveChunk requests can benefit

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-sharding Backlog - Sharding Team
              Reporter:
              kaloian.manassiev Kaloian Manassiev
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated: