-
Type: Bug
-
Resolution: Duplicate
-
Priority: Critical - P2
-
None
-
Affects Version/s: 3.2.17
-
Component/s: Sharding
-
None
-
Environment:Ubuntu 14.04
-
ALL
-
We have a sharded collection with approximately 170k chunks (spread across 11 replica sets) that a single mongos process incorrectly assumed was unsharded. This caused all reads/writes from that mongos process for the collection to go to the primary RS resulting in data inconsistency/loss from the view of that mongos process and the rest of the mongos processes in our environment.
Dec 22 13:55:07 web28 mongos.27011[99921]: [conn915497] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 6047 based on: (empty) Dec 22 13:55:07 web28 mongos.27011[99921]: [conn927627] ChunkManager loading chunks for trello.collection_2 sequenceNumber: 6048 based on: 41884|1||521778b1ca72fc747e2a31a5 Dec 22 13:55:07 web28 mongos.27011[99921]: [conn927627] ChunkManager load took 109 ms and found version 41885|1||521778b1ca72fc747e2a31a5 Dec 22 13:55:08 web28 mongos.27011[99921]: [conn915497] inconsistent chunks found when reloading trello.collection_1, previous version was 141470|3571||521778afca72fc747e2a31a3, this should be rare Dec 22 13:55:08 web28 mongos.27011[99921]: [conn915497] ChunkManager load failed after 1858 ms and will be retried up to 2 more times Dec 22 13:55:08 web28 mongos.27011[99921]: [conn915497] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 6047 based on: (empty) Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] got invalid chunk version 13976|1||521778afca72fc747e2a31a3 in document { _id: "trello.collection_1-idModel_ObjectId('545d1808cc454be8aed72d8b')date_new Date(1415387523080)", ns: "trello.collection_1", min: { idModel: ObjectId('545d1808cc454be8aed72d8b'), date: new Date(1415387523080) }, max: { idModel: ObjectId('545d182bc40615e260371473'), date: new Date(1415685077013) }, shard: "rs2", lastmod: Timestamp 13976000|1, lastmodEpoch: ObjectId('521778afca72fc747e2a31a3') } when trying to load differing chunks at version 0|0||000000000000000000000000 Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] major change in chunk information found when reloading trello.collection_1, previous version was 0|0||000000000000000000000000 Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] ChunkManager load took 872 ms and found version 0|0||000000000000000000000000 Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] no chunks found for collection trello.collection_1, assuming unsharded Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] ChunkManager loading chunks for trello.collection_3 sequenceNumber: 6049 based on: (empty) Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] ChunkManager load took 1 ms and found version 93|0||56f2ce47fefe6d524c67bfe4 Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] ChunkManager loading chunks for trello.collection_4 sequenceNumber: 6050 based on: (empty) Dec 22 13:55:09 web28 mongos.27011[99921]: [conn915497] ChunkManager load took 0 ms and found version 25|6||5205a989629b64bf3e892fba ...
It is worth noting we were manually splitting large chunks for collection_1 in a script with the split admin command at the time. These are the relevant logs from the instance running the script.
Dec 22 13:54:55 admin1 mongos.27018[10455]: [conn1019044] splitting chunk [{ idModel: ObjectId('546234baf3603a0b88fa84b6'), date: new Date(1430908962051) },{ id Model: ObjectId('546236b3560532670a785dae'), date: new Date(1445238746777) }) in collection trello.collection_1 on shard rs2 Dec 22 13:54:55 admin1 mongos.27018[10455]: [conn1019044] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 7322 based on: 141470|3561||521778afca 72fc747e2a31a3 Dec 22 13:54:56 admin1 mongos.27018[10455]: [conn1019044] ChunkManager load took 453 ms and found version 141470|3563||521778afca72fc747e2a31a3 Dec 22 13:54:58 admin1 mongos.27018[10455]: [conn1019044] splitting chunk [{ idModel: ObjectId('546247903ff470605f91bdbe'), date: new Date(1444235259298) },{ id Model: ObjectId('54624819f94e7792881237f9'), date: new Date(1418289843165) }) in collection trello.collection_1 on shard rs2 Dec 22 13:54:58 admin1 mongos.27018[10455]: [conn1019044] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 7323 based on: 141470|3563||521778afca 72fc747e2a31a3 Dec 22 13:54:59 admin1 mongos.27018[10455]: [conn1019044] ChunkManager load took 466 ms and found version 141470|3565||521778afca72fc747e2a31a3 Dec 22 13:54:59 admin1 mongos.27018[10455]: [conn1019044] splitting chunk [{ idModel: ObjectId('54624bb26b3af67855b71e67'), date: new Date(1435336429738) },{ id Model: ObjectId('54624f2cb0a0e4dd31604c91'), date: new Date(1361988839380) }) in collection trello.collection_1 on shard rs2 Dec 22 13:55:00 admin1 mongos.27018[10455]: [mongosMain] connection accepted from 127.0.0.1:45906 #1021778 (4 connections now open) Dec 22 13:55:00 admin1 mongos.27018[10455]: [mongosMain] connection accepted from 127.0.0.1:45907 #1021779 (5 connections now open) Dec 22 13:55:00 admin1 mongos.27018[10455]: [conn1019044] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 7324 based on: 141470|3565||521778afca72fc747e2a31a3 Dec 22 13:55:00 admin1 mongos.27018[10455]: [conn1021778] end connection 127.0.0.1:45906 (4 connections now open) Dec 22 13:55:00 admin1 mongos.27018[10455]: [conn1021779] end connection 127.0.0.1:45907 (3 connections now open) Dec 22 13:55:00 admin1 mongos.27018[10455]: [conn1019044] ChunkManager load took 456 ms and found version 141470|3567||521778afca72fc747e2a31a3 Dec 22 13:55:01 admin1 mongos.27018[10455]: [conn1019044] splitting chunk [{ idModel: ObjectId('546251f294b5b390883cffc4'), date: new Date(1444319064779) },{ idModel: ObjectId('5462555272ca804e14f8d262'), date: new Date(1416054342508) }) in collection trello.collection_1 on shard rs2 Dec 22 13:55:02 admin1 mongos.27018[10455]: [conn1019044] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 7325 based on: 141470|3567||521778afca72fc747e2a31a3 Dec 22 13:55:02 admin1 mongos.27018[10455]: [conn1019044] ChunkManager load took 455 ms and found version 141470|3569||521778afca72fc747e2a31a3 Dec 22 13:55:06 admin1 mongos.27018[10455]: [conn1019044] splitting chunk [{ idModel: ObjectId('5462771a6d6090d3aab69185'), date: new Date(1417582120683) },{ idModel: ObjectId('54627aa769c3cd28a24f65fd'), date: new Date(1415740158381) }) in collection trello.collection_1 on shard rs2 Dec 22 13:55:07 admin1 mongos.27018[10455]: [conn1019044] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 7326 based on: 141470|3569||521778afca72fc747e2a31a3 Dec 22 13:55:07 admin1 mongos.27018[10455]: [conn1019044] ChunkManager load took 451 ms and found version 141470|3571||521778afca72fc747e2a31a3 Dec 22 13:55:10 admin1 mongos.27018[10455]: [mongosMain] connection accepted from 127.0.0.1:45915 #1021780 (4 connections now open) Dec 22 13:55:10 admin1 mongos.27018[10455]: [mongosMain] connection accepted from 127.0.0.1:45916 #1021781 (5 connections now open) Dec 22 13:55:10 admin1 mongos.27018[10455]: [conn1021780] end connection 127.0.0.1:45915 (4 connections now open) Dec 22 13:55:10 admin1 mongos.27018[10455]: [conn1021781] end connection 127.0.0.1:45916 (3 connections now open) Dec 22 13:55:10 admin1 mongos.27018[10455]: [conn1019044] splitting chunk [{ idModel: ObjectId('5462a50a66b3264817ddd377'), date: new Date(1389628810403) },{ idModel: ObjectId('5462ad9dba3d75d3ece4ca07'), date: new Date(1416019961869) }) in collection trello.collection_1 on shard rs40 Dec 22 13:55:11 admin1 mongos.27018[10455]: [conn1019044] ChunkManager loading chunks for trello.collection_1 sequenceNumber: 7327 based on: 141470|3571||521778afca72fc747e2a31a3 Dec 22 13:55:11 admin1 mongos.27018[10455]: [conn1019044] ChunkManager load took 452 ms and found version 141470|3573||521778afca72fc747e2a31a3
- duplicates
-
SERVER-32198 Missing collection metadata on the shard implies both UNSHARDED and "metadata not loaded yet"
- Closed
- is related to
-
SERVER-22611 ChunkManager refresh can occasionally cause a full reload
- Closed