Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20150

Chunk migration locks are constantly blocking map/reduce

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 3.0.5
    • Sharding
    • None
    • ALL

    Description

      Hi.

      Please see SERVER-20149 for our deployment details.

      We're seeing another problem with shard balancer. After upgrade our app failed to perform map/reduce on sharded collections with following error:

      [conn26] the collection metadata could not be locked for mapreduce, already locked by { _id: "<db>.<collection>", process: "db03:27017:1440329966:296879767", state: 2, ts: ObjectId('55da2f929342a2275e4eb52f'), when: new Date(1440362386369), who: "db03:27017:1440329966:296879767:conn5087:1423199152", why: "migrating chunk [{ : MinKey }, { : MaxKey }) in db>.<collection>" }
      

      I've checked the status of the shard, and found that there a chunk migration is pending from Shard3 to Shard1, since there a imbalance in chunks distribution:

            chunks:
                                      shard1     713
                                      shard2     715
                                      shard3     812
      

      I've checked the db.opStatus on shard1 primary node and found out that migration process is blocked by secondary node, because it's doing the initial sync. We've decided to stop the initial sync to give primary node time to accept chunks from shard3. But after ~2 hours only 2 chunks are actually migrated and our collection was still locked. So we decided to stop the balancer, to allow our app to run again.

      Is this by design or something really went wrong during upgrade proccess? Because we haven't seen this issue before on 2.4 installation.

      Attachments

        Activity

          People

            sam.kleinman Sam Kleinman (Inactive)
            yopp Alex
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: