Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22611

ChunkManager refresh can occasionally cause a full reload

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.2.12, 3.4.2
    • Fix Version/s: 3.4.4, 3.5.5
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Sprint:
      Sharding 11 (03/11/16), Sharding 12 (04/01/16), Sharding 13 (04/22/16), Sharding 14 (05/13/16), Sharding 15 (06/03/16), Sharding 2017-03-27

      Description

      And can block other operations since it will take the DBConfig mutex. This happens when the chunk differ got an unexpected result from the config server (see here). This can potentially occur when yield occurs while querying the config server.

      Original Title: Chunk migration freezes all mongos servers for >60 seconds

      Original description:

      Time for bug #7.

      We are moving chunks of a collection in which the avg document size is pretty big, 6K. After every other chunk, we see this behavior:

      1. The source shard (from which the chunk is transferred) - shows this:

      2016-02-15T06:53:06.556+0000 I SHARDING [conn102845] moveChunk data transfer progress: { active: true, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "ready", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
      2016-02-15T06:53:06.559+0000 I SHARDING [conn102845] moveChunk data transfer progress: { active: true, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "clone", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
      2016-02-15T06:53:06.563+0000 I SHARDING [conn102845] moveChunk data transfer progress: { active: true, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "steady", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
      2016-02-15T06:53:06.563+0000 I SHARDING [conn102845] About to check if it is safe to enter critical section
      2016-02-15T06:53:06.563+0000 I SHARDING [conn102845] About to enter migrate critical section
      2016-02-15T06:53:06.565+0000 I SHARDING [conn626] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.565+0000 I SHARDING [conn24854] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.566+0000 I SHARDING [conn1399] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.567+0000 I SHARDING [conn297] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.567+0000 I SHARDING [conn23645] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.567+0000 I SHARDING [conn24637] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.650+0000 I SHARDING [conn102845] moveChunk setting version to: 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:06.650+0000 I SHARDING [conn1682] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.650+0000 I SHARDING [conn731] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.650+0000 I SHARDING [conn886] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn19421] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn1207] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn23153] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24688] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn1258] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24257] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn25297] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn413] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn549] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn23108] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn1416] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24825] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn690] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24280] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn778] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24187] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn19456] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn22996] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1052] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24951] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn23221] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn39] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24798] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24267] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn500] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn25435] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn23410] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1649] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24004] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn22921] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn38050] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn23955] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn38048] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1513] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn25348] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1163] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24304] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn25002] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn23489] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn141] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn720] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn196] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.654+0000 I SHARDING [conn23299] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.658+0000 I SHARDING [conn23908] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.659+0000 I SHARDING [conn23756] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.668+0000 I SHARDING [conn23617] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.668+0000 I SHARDING [conn24587] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.670+0000 I SHARDING [conn19435] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.672+0000 I SHARDING [conn23242] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.673+0000 I SHARDING [conn23442] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.677+0000 I SHARDING [conn23236] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.678+0000 I SHARDING [conn23358] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.680+0000 I SHARDING [conn868] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.684+0000 I SHARDING [conn25340] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.687+0000 I SHARDING [conn23963] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.689+0000 I SHARDING [conn1301] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.690+0000 I SHARDING [conn38131] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.694+0000 I SHARDING [conn19460] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.700+0000 I SHARDING [conn25085] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.701+0000 I SHARDING [conn144] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.701+0000 I SHARDING [conn23226] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.705+0000 I SHARDING [conn467] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.705+0000 I SHARDING [conn38121] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.707+0000 I SHARDING [conn24433] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.708+0000 I SHARDING [conn29531] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.708+0000 I SHARDING [conn787] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.709+0000 I NETWORK  [conn102384] end connection 10.40.7.177:48208 (5263 connections now open)
      2016-02-15T06:53:06.709+0000 I NETWORK  [conn105280] end connection 10.40.7.177:48268 (5263 connections now open)
      2016-02-15T06:53:06.709+0000 I SHARDING [conn102385] waiting till out of critical section
      2016-02-15T06:53:06.709+0000 I SHARDING [conn102385] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I SHARDING [conn23961] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I SHARDING [conn24775] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I WRITE    [conn38653] write request to old shard version 6451|6266||563ba74d869758a4542b9075 waiting for migration commit
      2016-02-15T06:53:06.711+0000 I SHARDING [conn38653] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I WRITE    [conn48728] write request to old shard version 6451|6269||563ba74d869758a4542b9075 waiting for migration commit
      2016-02-15T06:53:06.711+0000 I WRITE    [conn72829] write request to old shard version 6451|6269||563ba74d869758a4542b9075 waiting for migration commit
      2016-02-15T06:53:06.711+0000 I SHARDING [conn48728] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I SHARDING [conn72829] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105127] end connection 10.229.6.198:43840 (5261 connections now open)
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105340] end connection 10.37.137.232:59474 (5260 connections now open)
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105392] end connection 10.40.7.177:48282 (5259 connections now open)
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105344] end connection 10.37.137.232:59484 (5259 connections now open)
      2016-02-15T06:53:06.712+0000 I SHARDING [conn102875] waiting till out of critical section
      2016-02-15T06:53:06.712+0000 I SHARDING [conn102875] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105341] end connection 10.37.137.232:59475 (5258 connections now open)
      2016-02-15T06:53:06.713+0000 I SHARDING [conn399] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.713+0000 I SHARDING [conn24449] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn105339] end connection 10.37.137.232:59472 (5256 connections now open)
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn105342] end connection 10.37.137.232:59476 (5255 connections now open)
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn105343] end connection 10.37.137.232:59482 (5255 connections now open)
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn84162] end connection 10.229.6.198:43482 (5254 connections now open)
      2016-02-15T06:53:06.717+0000 I SHARDING [conn1005] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.723+0000 I SHARDING [conn23890] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.729+0000 I SHARDING [conn23314] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.737+0000 I SHARDING [conn23246] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.739+0000 I SHARDING [conn23684] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.744+0000 I SHARDING [conn102845] moveChunk migrate commit accepted by TO-shard: { active: false, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "done", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 }
      2016-02-15T06:53:06.744+0000 I SHARDING [conn102845] moveChunk updating self version to: 6452|1||563ba74d869758a4542b9075 through { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') } -> { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('56020ab2584ad15a7d97cb59') } for collection 'mydomain.Sessions'
      2016-02-15T06:53:06.750+0000 I SHARDING [conn23929] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.756+0000 I SHARDING [conn23365] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.761+0000 I SHARDING [conn1420] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.765+0000 I SHARDING [conn1255] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.781+0000 I SHARDING [conn23810] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.796+0000 I SHARDING [conn22987] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.799+0000 I SHARDING [conn23350] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.803+0000 I SHARDING [conn1302] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.823+0000 I SHARDING [conn102845] about to log metadata event into changelog: { _id: "ip-10-41-58-141-2016-02-15T06:53:06.823+0000-56c175d207f175cfb9224557", server: "ip-10-41-58-141", clientAddr: "10.40.7.177:48224", time: new Date(1455519186823), what: "moveChunk.commit", ns: "mydomain.Sessions", details: { min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, from: "shmydomain1", to: "shmydomain3", cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 } }
      2016-02-15T06:53:06.863+0000 I SHARDING [conn102845] MigrateFromStatus::done About to acquire global lock to exit critical section
      2016-02-15T06:53:06.863+0000 I SHARDING [conn102845] forking for cleanup of chunk data
      2016-02-15T06:53:06.865+0000 I SHARDING [conn102845] waiting for open cursors before removing range [{ a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }) in mydomain.Sessions, cursor ids: [326986289469, 327861050987, 329266703753, 329633996806, 330038618700, 330161093014, 330597544814]
      2016-02-15T06:53:06.865+0000 I SHARDING [conn102845] about to log metadata event into changelog: { _id: "ip-10-41-58-141-2016-02-15T06:53:06.865+0000-56c175d207f175cfb9224558", server: "ip-10-41-58-141", clientAddr: "10.40.7.177:48224", time: new Date(1455519186865), what: "moveChunk.from", ns: "mydomain.Sessions", details: { min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, step 1 of 6: 0, step 2 of 6: 310, step 3 of 6: 15, step 4 of 6: 8, step 5 of 6: 299, step 6 of 6: 1, to: "shmydomain3", from: "shmydomain1", note: "success" } }
      2016-02-15T06:53:06.993+0000 I SHARDING [conn102845] distributed lock 'mydomain.Sessions/ip-10-41-58-141:27017:1455127062:862901621' unlocked. 
      2016-02-15T06:53:07.779+0000 I SHARDING [conn1302] remotely refreshing metadata for mydomain.Sessions with requested shard version 6452|1||563ba74d869758a4542b9075 based on current shard version 6452|0||563ba74d869758a4542b9075, current metadata version is 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:07.779+0000 I SHARDING [conn23450] remotely refreshing metadata for mydomain.Sessions with requested shard version 6452|1||563ba74d869758a4542b9075 based on current shard version 6452|0||563ba74d869758a4542b9075, current metadata version is 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:07.779+0000 I SHARDING [conn1311] remotely refreshing metadata for mydomain.Sessions with requested shard version 6452|1||563ba74d869758a4542b9075 based on current shard version 6452|0||563ba74d869758a4542b9075, current metadata version is 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:07.780+0000 I SHARDING [conn102845] received moveChunk request: { moveChunk: "mydomain.CrashIssueReports", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", to: "rsmydomain2/in.db2m1.mydomain.com:27017,in.db2m2.mydomain.com:27017", fromShard: "shmydomain1", toShard: "shmydomain2", min: { p: 2, as: ObjectId('5660b4295e27753f5c614e0a'), d: new Date(1455408000000) }, max: { p: 3, as: ObjectId('569c9324294e2113fe1b23e1'), d: new Date(1455408000000) }, maxChunkSizeBytes: 67108864, configdb: "in.dbcfg1.mydomain.com:27019,in.dbcfg2.mydomain.com:27019,in.dbcfg3.mydomain.com:27019", secondaryThrottle: true, waitForDelete: false, maxTimeMS: 0, shardVersion: [ Timestamp 3000|1, ObjectId('56c0a812f6dbf7ace67b5e89') ], epoch: ObjectId('56c0a812f6dbf7ace67b5e89') }
      

      2. Then, all mongos servers (we have 3) - show this - for every single collection:

      2016-02-15T06:53:08.183+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col1: 1ms sequenceNumber: 15808 version: 26|7||54bd7ee267066c5f3e307c11 mydomain: (empty)
      2016-02-15T06:53:08.187+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col2: 3ms sequenceNumber: 15809 version: 86|1||54c775a4509aea9affefdbe4 mydomain: (empty)
      2016-02-15T06:53:08.188+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col3: 1ms sequenceNumber: 15810 version: 18|1||54be162bf5d6feabbc856429 mydomain: (empty)
      2016-02-15T06:53:08.189+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col4: 0ms sequenceNumber: 15811 version: 43|1||54be7cb0f5d6feabbc856440 mydomain: (empty)
      

      3. Then, all mongos don't respond for several seconds, sometimes more than 60 - rendering our database dead for that time.

      Happens mostly on the large-document collection but not only.
      mmapv1. 3 config servers. each shard: primary+secondary+arbiter.

      To help you guys, we measured which components get locked, mongos vs mongods, and its only the mongos servers.

      If it helps, we've also seen this "based on (empty)" behavior when a primary is demoted, and then comes back very fast after being demoted. When that happens - the only thing we found out that works is that we turn off all mongos and config servers, except one of each, and run flushRouterConfig - but that's probably another bug you have.

        Attachments

        1. cfg1
          93 kB
        2. cfg2
          86 kB
        3. cfg3
          100 kB
        4. diff
          0.9 kB
        5. donor
          1.96 MB
        6. donor-metrics1
          2.35 MB
        7. donor-metrics2
          40 kB
        8. double_reload.js
          2 kB
        9. mongos
          1.17 MB
        10. mongos2
          3.43 MB
        11. test.js
          0.9 kB
        12. to
          132 kB
        13. to-metrics1
          7.17 MB
        14. to-metrics2
          9 kB

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                25 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: