Details
-
Bug
-
Status: Closed
-
Major - P3
-
Resolution: Duplicate
-
3.4.4
-
None
-
None
-
ALL
Description
Hello,
Regularly our mongos stop refreshing chunks from config serv for some collections. And when trying to split chunk, produces "IncompatibleShardingMetadata: Unable to find chunk with the exact bounds" if the chunk was already split by another mongos.
Our Mongo cluster details :
- Many shards + config replica set, each formed by 3 members (1 primary + 2 secondary)
- 2 mongos
- Balancer is disabled
- Package version 3.4.4, OS: Debian 8 Jessie
- Servers: 6 cores Xeon CPU, 64GB RAM, ~3To SSD, ext4 file system
- ~ 40 collections in 1 DB
- Many writes and reads
Classic scenario (shard, collection and fields names and values was replaced) :
From mongos A logs :
2017-06-21T14:46:42.087+0200 I SHARDING [conn6] Refreshing chunks for collection stats.collectionName based on version 9743|18393||5320f5e96789f4d11460c4a0
|
2017-06-21T14:46:42.129+0200 I SHARDING [CatalogCacheLoader-1] Refresh for collection stats.collectionName took 42 ms and found version 9743|18393||5320f5e96789f4d11460c4a0
|
From mongos B logs :
2017-06-21T14:51:05.844+0200 I SHARDING [conn3103094] autosplitted stats.collectionName chunk: shard: shardName, lastmod: 9743|18367||5320f5e96789f4d11460c4a0, [{ _id: { d: 20170621, a: 78, c: 909090, d: 12345678 } }, { _id: { d: 20170621, a: 4444, b: 111111111, c: 222222, d: 3333333 } }) into 3 parts (splitThreshold 67108864) (migrate suggested, but no migrations allowed)
|
From mongos A logs :
2017-06-21T14:55:21.233+0200 I SHARDING [conn379] Split chunk { splitChunk: "stats.collectionName", configdb: "csReplSet/172.16.18.28:27025,172.16.18.3:27025,172.16.18.30:27025", from: "shardName", keyPattern: { _id: 1.0 }, shardVersion: [ Timestamp 9743000|149465, ObjectId('5320f5e96789f4d11460c4a0') ], min: { _id: { d: 20170621, a: 4444, b: 111111111, c: 222222, d: 3333333 } }, max: { _id: { d: 20170621, a: 4444, b: 121212121, c: 343434, d: 5656565 } }, splitKeys: [ { _id: { d: 20170621, a: 4444, b: 555555555, c: 666666, d: 7777777 } }, { _id: { d: 20170621, a: 4444, b: 888888888, c: 999, d: 000000 } } ] } failed :: caused by :: IncompatibleShardingMetadata: *Unable to find chunk with the exact bounds* [{ _id: { d: 20170621, a: 4444, b: 111111111, c: 222222, d: 3333333 } }, { _id: { d: 20170621, a: 4444, b: 121212121, c: 343434, d: 5656565 } }) at collection version 9743|18399||5320f5e96789f4d11460c4a0
|
We can see that between refresh chunk and split try on mongos A, the other mongos already split that chunk. So the split try faild.
The problem is that sometimes a mongos suddenly stops to refresh a collection until we restart / force it, so for a long time. And in that cases after few days the mongos is doing bigger and bigger split tries :
2017-06-20T11:39:15.052+0200 I SHARDING [conn2766148] warning: log line attempted (53kB) over max size (10kB), printing beginning and end ... Split chunk { splitChunk: "stats.collectionName", configdb: "csReplSet/172.16.18.28:27025,172.16.18.3:27025,172.16.18.30:27025", from: "shardName", keyPattern: { _id: 1.0 }, shardVersion: [ Timestamp 3000|18274, ObjectId('5667717d46b7ddcd61ef5459') ], min: { _id: { d: 20170611, a: 111111, b: 2222222, c: 333, d: 333 } }, max: { _id: MaxKey }, splitKeys: [ ...... VERY LONG KEYS LIST ...... ] } failed :: caused by :: IncompatibleShardingMetadata: Unable to find chunk with the exact bounds [{ _id: { d: 20170611, a: 111111, b: 2222222, c: 333, d: 333 } }, { _id: MaxKey }) at collection version 3|19540||5667717d46b7ddcd61ef5459
|
"_id.d" is the insert date, here was 20170611 but as you can see the log entry date is 2017-06-20. The diff is 9 days, 9 days of failed split tries. During this period, we found no chunk refresh in logs for the concerned collection. Theses big split tries slows a lot our shards (long splitVector queries on primary members) which is very troublesome for us.
So we have to execute regularly a db.adminCommand("flushRouterConfig") on mongos to force refresh.
Thank you in advance for your help.
Best regards,
Slawomir
Attachments
Issue Links
- duplicates
-
SERVER-28418 make the split command on mongod return a stale version error if the requested chunk bounds are not found
-
- Closed
-