Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20559

Race condition in shard registry during concurrent sharding operations

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.1.9
    • Affects Version/s: 3.1.8
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • Hide

      This bug was uncovered through the FSM concurrency suite. To reproduce more simply, run moveChunk, splitChunk, and mergeChunk commands at a high degree of concurrency (>20 threads).

      Show
      This bug was uncovered through the FSM concurrency suite. To reproduce more simply, run moveChunk, splitChunk, and mergeChunk commands at a high degree of concurrency (>20 threads).
    • TIG A (10/09/15)

      The moveChunk command routinely performs a reload of all shards in the shard registry, which clears the shard registry's ShardMap objects. The ShardMap objects contain shared pointers to Shard objects, so the Shard objects are deleted on these reloads.

      Other shard commands such as splitChunk and mergeChunks also obtain shared pointers to these Shard objects to grab the Shard's RemoteCommandTargeter object, which is owned by the Shard. The commands release the shared pointer to the Shard object but continue to use the RemoteCommandTargeter, so if the Shard is deleted during a concurrent moveChunk, then its RemoteCommandTargeter is deleted along with it, leaving the splitChunk or mergeChunk commands with an invalid reference to a deleted RemoteCommandTargeter. When they then attempt to use the RemoteCommandTargeter, a use-after-free occurs.

      Potential fix: remove the intermediate _targeter() method so that shared_ptr to the Shard is in scope for as long as the RemoteCommandTargeter.

            Assignee:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Reporter:
            esha.maharishi@mongodb.com Esha Maharishi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: