Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16770

mongos blocks db during shardCollection

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.14, 2.6.12, 2.8.0-rc4, 3.0.14
    • Component/s: Sharding
    • None
    • ALL
    • Hide
      1. Start a small sharded cluster, eg:
        mlaunch init --single --sharded 2 --config 3 --mongos 2 --port 37592 --smallfiles
        
      2. Add some data, such that shardCollection will want to create a large number of initial chunks:
        sh.stopBalancer();
        db.getSiblingDB("config").settings.update( { _id: "chunksize" }, { value: 1 } );
        sh.enableSharding("test");
        var s = (new Array(512*1024+1)).join("x");
        for (i = 0; i < 7157; i++) db.test.insert( { i: i, s: s } );
        db.test.ensureIndex( { i: 1 } );
        
      3. Shard the collection:
        sh.shardCollection("test.test", { i: 1 } )
        
      4. While the shardCollection is in progress, on a different connection to the same mongos, try to do any operation on the database in question (the database which is having a collection sharded), eg any/all of:
        db.test.count()
        db.test.findOne()
        db.test.insert({})
        db.foo.count()
        db.foo.findOne()
        db.foo.insert({})
        

        Any of these commands will block until shardCollection completes.

      Show
      Start a small sharded cluster, eg: mlaunch init --single --sharded 2 --config 3 --mongos 2 --port 37592 --smallfiles Add some data, such that shardCollection will want to create a large number of initial chunks: sh.stopBalancer(); db.getSiblingDB( "config" ).settings.update( { _id: "chunksize" }, { value: 1 } ); sh.enableSharding( "test" ); var s = ( new Array(512*1024+1)).join( "x" ); for (i = 0; i < 7157; i++) db.test.insert( { i: i, s: s } ); db.test.ensureIndex( { i: 1 } ); Shard the collection: sh.shardCollection( "test.test" , { i: 1 } ) While the shardCollection is in progress, on a different connection to the same mongos, try to do any operation on the database in question (the database which is having a collection sharded), eg any/all of: db.test.count() db.test.findOne() db.test.insert({}) db.foo.count() db.foo.findOne() db.foo.insert({}) Any of these commands will block until shardCollection completes.

      Symptoms

      While shardCollection is running, accesses to the same database (via the same mongos) will block until shardCollection finishes, at which point they run. However, accessing the database is possible via other mongos. On the affected mongos, accesses to other databases are fine.

      Impact

      shardCollection can sometimes take a long time to run — with 3 config servers, initial chunks are created at a rate of about 10/s in 2.4/2.6, and about 40/s in 2.8. This means there can be a long period where the database in question isn't accessible via the mongos that is doing the shardCollection.

      Results

      When running the repro, the shardCollection takes about 13 mins to do the initial chunk splits on 2.4/2.6, and about 3 mins on 2.8 (with 3 config servers).

      2015-01-08T15:40:46.105+1100 I COMMAND  [conn5] CMD: shardcollection: { shardCollection: "test.test", key: { i: 1.0 } }
      2015-01-08T15:40:46.105+1100 I SHARDING [conn5] enable sharding on: test.test with shard key: { i: 1.0 }
      2015-01-08T15:40:46.105+1100 I SHARDING [conn5] about to log metadata event: { _id: "genique-2015-01-08T04:40:46-54ae0a4e284cf6acc610c196", server: "genique", clientAddr: "N/A", time: new Date(1420692046105), what: "shardCollection.start", ns: "test.test", details: { shardKey: { i: 1.0 }, collection: "test.test", primary: "shard01:genique:37594", initShards: [], numChunks: 1 } }
      2015-01-08T15:40:46.123+1100 I SHARDING [conn5] going to create 7157 chunk(s) for: test.test using new epoch 54ae0a4e284cf6acc610c197
      2015-01-08T15:43:17.497+1100 I SHARDING [conn5] ChunkManager: time to load chunks for test.test: 76ms sequenceNumber: 3 version: 1|7156||54ae0a4e284cf6acc610c197 based on: (empty)
      2015-01-08T15:43:17.554+1100 I SHARDING [conn5] about to log metadata event: { _id: "genique-2015-01-08T04:43:17-54ae0ae5284cf6acc610c198", server: "genique", clientAddr: "N/A", time: new Date(1420692197554), what: "shardCollection", ns: "test.test", details: { version: "1|7156||54ae0a4e284cf6acc610c197" } }
      

      While the shardCollection is running, any of the given 6 test commands will block until the shardCollection completes (when run against the "test" db, on the mongos which is doing the shardCollection).

      By contrast,

      • The actions do not block if done on the same mongos, but a different db (even one living on the same shard as the affected db).
      • The actions do not block if done on another mongos (either the same db or a different db).
      • The actions do not block if done directly on any shard (for any db).
      Hypothesis

      It's as if the db lock is being held by the mongos, although I can't imagine why it would need to do that. Unfortunately it's not possible to use currentOp to introspect what's happening inside the shardCollection on the mongos (SERVER-18094).

      Workaround

      Use a separate, dedicated mongos for the purposes of running shardCollection.

            Assignee:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: