Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-46062

Prevent ChunkManagerTargeter from accessing all shard versions before targeting a write

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.3.4
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • Sharding 2020-02-24

      The cluster writer calls ChunkManagerTargeter::targetCollection() in order to verify whether a write targets the config server or shard servers. In doing so, the targeter queries the shard version for each shard. The shard version is necessary in order to create a ShardEndpoint object. However, we don't consume any ShardEndpoint data in the cluster writer. We only use the endpoints to verify whether any of the targeted endpoints are the config server, then we throw away the rest of the object.

      This has an unintended side-effect. As a result of PM-1633, when we retrieve a shard version, we will throw an exception if that shard has been marked as stale. As a result, any attempts to target a write through the cluster writer will stall on a catalog cache refresh if any shard is stale, regardless of whether the particular write targets stale shards.

      It's key to note that we aren't using ::targetCollection() for its intended purpose – we attempt to collect shard versions that we never use. Fortunately, the cluster writer is the only place where we call ::targetCollection(). We can prevent the issue of querying a shard version causing a refresh if we remove ::targetCollection() entirely.

      I propose to remove ::targetCollection() and replace it with a function on NSTargeter/ChunkManagerTargeter endpointIsConfigServer() that will return a boolean representing whether or not the targeted endpoints represent the config server. In doing so, we will retain the same logic that exists, except that we are bypassing creating ShardEndpoints, thus avoiding altogether the shard version issue.

      I'm proud to report that from local testing, implementing this change gives a 6% performance improvement in targeted performance workloads.

            Assignee:
            blake.oler@mongodb.com Blake Oler
            Reporter:
            blake.oler@mongodb.com Blake Oler
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: