Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22611

ChunkManager refresh can occasionally cause a full reload

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.2.12, 3.4.2
    • Fix Version/s: 3.4.4, 3.5.5
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Sprint:
      Sharding 11 (03/11/16), Sharding 12 (04/01/16), Sharding 13 (04/22/16), Sharding 14 (05/13/16), Sharding 15 (06/03/16), Sharding 2017-03-27

      Description

      And can block other operations since it will take the DBConfig mutex. This happens when the chunk differ got an unexpected result from the config server (see here). This can potentially occur when yield occurs while querying the config server.

      Original Title: Chunk migration freezes all mongos servers for >60 seconds

      Original description:

      Time for bug #7.

      We are moving chunks of a collection in which the avg document size is pretty big, 6K. After every other chunk, we see this behavior:

      1. The source shard (from which the chunk is transferred) - shows this:

      2016-02-15T06:53:06.556+0000 I SHARDING [conn102845] moveChunk data transfer progress: { active: true, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "ready", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
      2016-02-15T06:53:06.559+0000 I SHARDING [conn102845] moveChunk data transfer progress: { active: true, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "clone", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
      2016-02-15T06:53:06.563+0000 I SHARDING [conn102845] moveChunk data transfer progress: { active: true, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "steady", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 } my mem used: 0
      2016-02-15T06:53:06.563+0000 I SHARDING [conn102845] About to check if it is safe to enter critical section
      2016-02-15T06:53:06.563+0000 I SHARDING [conn102845] About to enter migrate critical section
      2016-02-15T06:53:06.565+0000 I SHARDING [conn626] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.565+0000 I SHARDING [conn24854] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.566+0000 I SHARDING [conn1399] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.567+0000 I SHARDING [conn297] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.567+0000 I SHARDING [conn23645] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.567+0000 I SHARDING [conn24637] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.650+0000 I SHARDING [conn102845] moveChunk setting version to: 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:06.650+0000 I SHARDING [conn1682] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.650+0000 I SHARDING [conn731] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.650+0000 I SHARDING [conn886] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn19421] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn1207] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn23153] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24688] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn1258] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24257] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn25297] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn413] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn549] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn23108] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn1416] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24825] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn690] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24280] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn778] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn24187] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.651+0000 I SHARDING [conn19456] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn22996] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1052] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24951] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn23221] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn39] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24798] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24267] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn500] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn25435] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn23410] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1649] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24004] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn22921] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn38050] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn23955] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn38048] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1513] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn25348] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn1163] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.652+0000 I SHARDING [conn24304] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn25002] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn23489] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn141] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn720] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.653+0000 I SHARDING [conn196] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.654+0000 I SHARDING [conn23299] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.658+0000 I SHARDING [conn23908] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.659+0000 I SHARDING [conn23756] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.668+0000 I SHARDING [conn23617] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.668+0000 I SHARDING [conn24587] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.670+0000 I SHARDING [conn19435] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.672+0000 I SHARDING [conn23242] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.673+0000 I SHARDING [conn23442] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.677+0000 I SHARDING [conn23236] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.678+0000 I SHARDING [conn23358] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.680+0000 I SHARDING [conn868] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.684+0000 I SHARDING [conn25340] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.687+0000 I SHARDING [conn23963] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.689+0000 I SHARDING [conn1301] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.690+0000 I SHARDING [conn38131] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.694+0000 I SHARDING [conn19460] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.700+0000 I SHARDING [conn25085] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.701+0000 I SHARDING [conn144] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.701+0000 I SHARDING [conn23226] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.705+0000 I SHARDING [conn467] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.705+0000 I SHARDING [conn38121] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.707+0000 I SHARDING [conn24433] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.708+0000 I SHARDING [conn29531] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.708+0000 I SHARDING [conn787] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.709+0000 I NETWORK  [conn102384] end connection 10.40.7.177:48208 (5263 connections now open)
      2016-02-15T06:53:06.709+0000 I NETWORK  [conn105280] end connection 10.40.7.177:48268 (5263 connections now open)
      2016-02-15T06:53:06.709+0000 I SHARDING [conn102385] waiting till out of critical section
      2016-02-15T06:53:06.709+0000 I SHARDING [conn102385] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I SHARDING [conn23961] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I SHARDING [conn24775] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I WRITE    [conn38653] write request to old shard version 6451|6266||563ba74d869758a4542b9075 waiting for migration commit
      2016-02-15T06:53:06.711+0000 I SHARDING [conn38653] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I WRITE    [conn48728] write request to old shard version 6451|6269||563ba74d869758a4542b9075 waiting for migration commit
      2016-02-15T06:53:06.711+0000 I WRITE    [conn72829] write request to old shard version 6451|6269||563ba74d869758a4542b9075 waiting for migration commit
      2016-02-15T06:53:06.711+0000 I SHARDING [conn48728] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.711+0000 I SHARDING [conn72829] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105127] end connection 10.229.6.198:43840 (5261 connections now open)
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105340] end connection 10.37.137.232:59474 (5260 connections now open)
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105392] end connection 10.40.7.177:48282 (5259 connections now open)
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105344] end connection 10.37.137.232:59484 (5259 connections now open)
      2016-02-15T06:53:06.712+0000 I SHARDING [conn102875] waiting till out of critical section
      2016-02-15T06:53:06.712+0000 I SHARDING [conn102875] Waiting for 10 seconds for the migration critical section to end
      2016-02-15T06:53:06.712+0000 I NETWORK  [conn105341] end connection 10.37.137.232:59475 (5258 connections now open)
      2016-02-15T06:53:06.713+0000 I SHARDING [conn399] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.713+0000 I SHARDING [conn24449] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn105339] end connection 10.37.137.232:59472 (5256 connections now open)
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn105342] end connection 10.37.137.232:59476 (5255 connections now open)
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn105343] end connection 10.37.137.232:59482 (5255 connections now open)
      2016-02-15T06:53:06.714+0000 I NETWORK  [conn84162] end connection 10.229.6.198:43482 (5254 connections now open)
      2016-02-15T06:53:06.717+0000 I SHARDING [conn1005] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.723+0000 I SHARDING [conn23890] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.729+0000 I SHARDING [conn23314] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.737+0000 I SHARDING [conn23246] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.739+0000 I SHARDING [conn23684] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.744+0000 I SHARDING [conn102845] moveChunk migrate commit accepted by TO-shard: { active: false, ns: "mydomain.Sessions", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, shardKeyPattern: { a: 1.0, _id: 1.0 }, state: "done", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 }
      2016-02-15T06:53:06.744+0000 I SHARDING [conn102845] moveChunk updating self version to: 6452|1||563ba74d869758a4542b9075 through { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') } -> { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('56020ab2584ad15a7d97cb59') } for collection 'mydomain.Sessions'
      2016-02-15T06:53:06.750+0000 I SHARDING [conn23929] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.756+0000 I SHARDING [conn23365] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.761+0000 I SHARDING [conn1420] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.765+0000 I SHARDING [conn1255] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.781+0000 I SHARDING [conn23810] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.796+0000 I SHARDING [conn22987] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.799+0000 I SHARDING [conn23350] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.803+0000 I SHARDING [conn1302] Waiting for 30 seconds for the migration critical section to end
      2016-02-15T06:53:06.823+0000 I SHARDING [conn102845] about to log metadata event into changelog: { _id: "ip-10-41-58-141-2016-02-15T06:53:06.823+0000-56c175d207f175cfb9224557", server: "ip-10-41-58-141", clientAddr: "10.40.7.177:48224", time: new Date(1455519186823), what: "moveChunk.commit", ns: "mydomain.Sessions", details: { min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, from: "shmydomain1", to: "shmydomain3", cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 } }
      2016-02-15T06:53:06.863+0000 I SHARDING [conn102845] MigrateFromStatus::done About to acquire global lock to exit critical section
      2016-02-15T06:53:06.863+0000 I SHARDING [conn102845] forking for cleanup of chunk data
      2016-02-15T06:53:06.865+0000 I SHARDING [conn102845] waiting for open cursors before removing range [{ a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }) in mydomain.Sessions, cursor ids: [326986289469, 327861050987, 329266703753, 329633996806, 330038618700, 330161093014, 330597544814]
      2016-02-15T06:53:06.865+0000 I SHARDING [conn102845] about to log metadata event into changelog: { _id: "ip-10-41-58-141-2016-02-15T06:53:06.865+0000-56c175d207f175cfb9224558", server: "ip-10-41-58-141", clientAddr: "10.40.7.177:48224", time: new Date(1455519186865), what: "moveChunk.from", ns: "mydomain.Sessions", details: { min: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55f705bca21c0f117ccc613c') }, max: { a: ObjectId('5334b6f2645cff3b5097f4f9'), _id: ObjectId('55fb9231584ad132f5207391') }, step 1 of 6: 0, step 2 of 6: 310, step 3 of 6: 15, step 4 of 6: 8, step 5 of 6: 299, step 6 of 6: 1, to: "shmydomain3", from: "shmydomain1", note: "success" } }
      2016-02-15T06:53:06.993+0000 I SHARDING [conn102845] distributed lock 'mydomain.Sessions/ip-10-41-58-141:27017:1455127062:862901621' unlocked. 
      2016-02-15T06:53:07.779+0000 I SHARDING [conn1302] remotely refreshing metadata for mydomain.Sessions with requested shard version 6452|1||563ba74d869758a4542b9075 based on current shard version 6452|0||563ba74d869758a4542b9075, current metadata version is 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:07.779+0000 I SHARDING [conn23450] remotely refreshing metadata for mydomain.Sessions with requested shard version 6452|1||563ba74d869758a4542b9075 based on current shard version 6452|0||563ba74d869758a4542b9075, current metadata version is 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:07.779+0000 I SHARDING [conn1311] remotely refreshing metadata for mydomain.Sessions with requested shard version 6452|1||563ba74d869758a4542b9075 based on current shard version 6452|0||563ba74d869758a4542b9075, current metadata version is 6452|0||563ba74d869758a4542b9075
      2016-02-15T06:53:07.780+0000 I SHARDING [conn102845] received moveChunk request: { moveChunk: "mydomain.CrashIssueReports", from: "rsmydomain/10.35.151.119:27017,in.db1m1.mydomain.com:27017,in.db1m2.mydomain.com:27017", to: "rsmydomain2/in.db2m1.mydomain.com:27017,in.db2m2.mydomain.com:27017", fromShard: "shmydomain1", toShard: "shmydomain2", min: { p: 2, as: ObjectId('5660b4295e27753f5c614e0a'), d: new Date(1455408000000) }, max: { p: 3, as: ObjectId('569c9324294e2113fe1b23e1'), d: new Date(1455408000000) }, maxChunkSizeBytes: 67108864, configdb: "in.dbcfg1.mydomain.com:27019,in.dbcfg2.mydomain.com:27019,in.dbcfg3.mydomain.com:27019", secondaryThrottle: true, waitForDelete: false, maxTimeMS: 0, shardVersion: [ Timestamp 3000|1, ObjectId('56c0a812f6dbf7ace67b5e89') ], epoch: ObjectId('56c0a812f6dbf7ace67b5e89') }
      

      2. Then, all mongos servers (we have 3) - show this - for every single collection:

      2016-02-15T06:53:08.183+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col1: 1ms sequenceNumber: 15808 version: 26|7||54bd7ee267066c5f3e307c11 mydomain: (empty)
      2016-02-15T06:53:08.187+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col2: 3ms sequenceNumber: 15809 version: 86|1||54c775a4509aea9affefdbe4 mydomain: (empty)
      2016-02-15T06:53:08.188+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col3: 1ms sequenceNumber: 15810 version: 18|1||54be162bf5d6feabbc856429 mydomain: (empty)
      2016-02-15T06:53:08.189+0000 I SHARDING [conn265475] ChunkManager: time to load chunks for mydomain.Col4: 0ms sequenceNumber: 15811 version: 43|1||54be7cb0f5d6feabbc856440 mydomain: (empty)
      

      3. Then, all mongos don't respond for several seconds, sometimes more than 60 - rendering our database dead for that time.

      Happens mostly on the large-document collection but not only.
      mmapv1. 3 config servers. each shard: primary+secondary+arbiter.

      To help you guys, we measured which components get locked, mongos vs mongods, and its only the mongos servers.

      If it helps, we've also seen this "based on (empty)" behavior when a primary is demoted, and then comes back very fast after being demoted. When that happens - the only thing we found out that works is that we turn off all mongos and config servers, except one of each, and run flushRouterConfig - but that's probably another bug you have.

      1. cfg1
        93 kB
        Yoni Douek
      2. cfg2
        86 kB
        Yoni Douek
      3. cfg3
        100 kB
        Yoni Douek
      4. diff
        0.9 kB
        Randolph Tan
      5. donor
        1.96 MB
        Yoni Douek
      6. donor-metrics1
        2.35 MB
        Yoni Douek
      7. donor-metrics2
        40 kB
        Yoni Douek
      8. double_reload.js
        2 kB
        Randolph Tan
      9. mongos
        1.17 MB
        Yoni Douek
      10. mongos2
        3.43 MB
        Yoni Douek
      11. test.js
        0.9 kB
        Randolph Tan
      12. to
        132 kB
        Yoni Douek
      13. to-metrics1
        7.17 MB
        Yoni Douek
      14. to-metrics2
        9 kB
        Yoni Douek

        Issue Links

          Activity

          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

          Message: SERVER-22611 Sharding catalog cache refactor

          (cherry picked from commit 39e06c9ef8c797ad626956b564ac9ebe295cbaf3)
          (cherry picked from commit d595a0fc8150411fd6541d06b08de9bee0039baa)
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/0f715bb978334314a0304b3d9aa629d297f2b313

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'} Message: SERVER-22611 Sharding catalog cache refactor (cherry picked from commit 39e06c9ef8c797ad626956b564ac9ebe295cbaf3) (cherry picked from commit d595a0fc8150411fd6541d06b08de9bee0039baa) Branch: v3.4 https://github.com/mongodb/mongo/commit/0f715bb978334314a0304b3d9aa629d297f2b313
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

          Message: SERVER-22611 Remove accidentally added temporary files and do cleanup
          Branch: master
          https://github.com/mongodb/mongo/commit/91196a0bed9277f6b170fdd0c1c79ed15b9295f5

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'} Message: SERVER-22611 Remove accidentally added temporary files and do cleanup Branch: master https://github.com/mongodb/mongo/commit/91196a0bed9277f6b170fdd0c1c79ed15b9295f5
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

          Message: SERVER-22611 Make CatalogCache::onStaleConfigError clear the passed cache entry

          (cherry picked from commit 758bc2adcf2c83363d0fdfdef0cbd1cf3c800e62)
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/b868de40b3a60233aff3370323b2325deafc0e8d

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'} Message: SERVER-22611 Make CatalogCache::onStaleConfigError clear the passed cache entry (cherry picked from commit 758bc2adcf2c83363d0fdfdef0cbd1cf3c800e62) Branch: v3.4 https://github.com/mongodb/mongo/commit/b868de40b3a60233aff3370323b2325deafc0e8d
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

          Message: SERVER-22611 Make the catalog cache unit-tests go through the CatalogCache

          Instead of calling its internal logic directly.

          (cherry picked from commit 84d94351aa308caf2c684b0fe5fbb7f942c75bd0)
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/892058e1cd3ae4744e8d13a589081330ea09f486

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'} Message: SERVER-22611 Make the catalog cache unit-tests go through the CatalogCache Instead of calling its internal logic directly. (cherry picked from commit 84d94351aa308caf2c684b0fe5fbb7f942c75bd0) Branch: v3.4 https://github.com/mongodb/mongo/commit/892058e1cd3ae4744e8d13a589081330ea09f486
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

          Message: SERVER-22611 Get rid of ChunkDiff and add more CatalogCache tests

          This change gets rid of the "chunk differ" which was previously shared
          between mongos and mongod. Instead its relatively simple logic has been
          moved inside the CatalogCache.

          (cherry picked from commit b1fd308ad04a5a6719fe72bcd23b10f1b8266097)
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/9e3a63f9cf9ef3e64dd991824eb87dcf170d3d31

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'} Message: SERVER-22611 Get rid of ChunkDiff and add more CatalogCache tests This change gets rid of the "chunk differ" which was previously shared between mongos and mongod. Instead its relatively simple logic has been moved inside the CatalogCache. (cherry picked from commit b1fd308ad04a5a6719fe72bcd23b10f1b8266097) Branch: v3.4 https://github.com/mongodb/mongo/commit/9e3a63f9cf9ef3e64dd991824eb87dcf170d3d31

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              25 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                  Agile