Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.4
Component/s: Sharding
Labels:
None

Operating System:
ALL
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hello,

Regularly our mongos stop refreshing chunks from config serv for some collections. And when trying to split chunk, produces "IncompatibleShardingMetadata: Unable to find chunk with the exact bounds" if the chunk was already split by another mongos.

Our Mongo cluster details :

Many shards + config replica set, each formed by 3 members (1 primary + 2 secondary)
2 mongos
Balancer is disabled
Package version 3.4.4, OS: Debian 8 Jessie
Servers: 6 cores Xeon CPU, 64GB RAM, ~3To SSD, ext4 file system
~ 40 collections in 1 DB
Many writes and reads

Classic scenario (shard, collection and fields names and values was replaced) :

From mongos A logs :

2017-06-21T14:46:42.087+0200 I SHARDING [conn6] Refreshing chunks for collection stats.collectionName based on version 9743|18393||5320f5e96789f4d11460c4a0
2017-06-21T14:46:42.129+0200 I SHARDING [CatalogCacheLoader-1] Refresh for collection stats.collectionName took 42 ms and found version 9743|18393||5320f5e96789f4d11460c4a0

From mongos B logs :

2017-06-21T14:51:05.844+0200 I SHARDING [conn3103094] autosplitted stats.collectionName chunk: shard: shardName, lastmod: 9743|18367||5320f5e96789f4d11460c4a0, [{ _id: { d: 20170621, a: 78, c: 909090, d: 12345678 } }, { _id: { d: 20170621, a: 4444, b: 111111111, c: 222222, d: 3333333 } }) into 3 parts (splitThreshold 67108864) (migrate suggested, but no migrations allowed)

From mongos A logs :

2017-06-21T14:55:21.233+0200 I SHARDING [conn379] Split chunk { splitChunk: "stats.collectionName", configdb: "csReplSet/172.16.18.28:27025,172.16.18.3:27025,172.16.18.30:27025", from: "shardName", keyPattern: { _id: 1.0 }, shardVersion: [ Timestamp 9743000|149465, ObjectId('5320f5e96789f4d11460c4a0') ], min: { _id: { d: 20170621, a: 4444, b: 111111111, c: 222222, d: 3333333 } }, max: { _id: { d: 20170621, a: 4444, b: 121212121, c: 343434, d: 5656565 } }, splitKeys: [ { _id: { d: 20170621, a: 4444, b: 555555555, c: 666666, d: 7777777 } }, { _id: { d: 20170621, a: 4444, b: 888888888, c: 999, d: 000000 } } ] } failed :: caused by :: IncompatibleShardingMetadata: *Unable to find chunk with the exact bounds* [{ _id: { d: 20170621, a: 4444, b: 111111111, c: 222222, d: 3333333 } }, { _id: { d: 20170621, a: 4444, b: 121212121, c: 343434, d: 5656565 } }) at collection version 9743|18399||5320f5e96789f4d11460c4a0

We can see that between refresh chunk and split try on mongos A, the other mongos already split that chunk. So the split try faild.

The problem is that sometimes a mongos suddenly stops to refresh a collection until we restart / force it, so for a long time. And in that cases after few days the mongos is doing bigger and bigger split tries :

2017-06-20T11:39:15.052+0200 I SHARDING [conn2766148] warning: log line attempted (53kB) over max size (10kB), printing beginning and end ... Split chunk { splitChunk: "stats.collectionName", configdb: "csReplSet/172.16.18.28:27025,172.16.18.3:27025,172.16.18.30:27025", from: "shardName", keyPattern: { _id: 1.0 }, shardVersion: [ Timestamp 3000|18274, ObjectId('5667717d46b7ddcd61ef5459') ], min: { _id: { d: 20170611, a: 111111, b: 2222222, c: 333, d: 333 } }, max: { _id: MaxKey }, splitKeys: [ ...... VERY LONG KEYS LIST ...... ] } failed :: caused by :: IncompatibleShardingMetadata: Unable to find chunk with the exact bounds [{ _id: { d: 20170611, a: 111111, b: 2222222, c: 333, d: 333 } }, { _id: MaxKey }) at collection version 3|19540||5667717d46b7ddcd61ef5459

"_id.d" is the insert date, here was 20170611 but as you can see the log entry date is 2017-06-20. The diff is 9 days, 9 days of failed split tries. During this period, we found no chunk refresh in logs for the concerned collection. Theses big split tries slows a lot our shards (long splitVector queries on primary members) which is very troublesome for us.

So we have to execute regularly a db.adminCommand("flushRouterConfig") on mongos to force refresh.

Thank you in advance for your help.

Best regards,
Slawomir

duplicates

SERVER-28418 make the split command on mongod return a stale version error if the requested chunk bounds are not found

Closed

Assignee:: Esha Maharishi (Inactive)
Reporter:: Slawomir Lukiewski
Participants:: Esha Maharishi, Kaloian Manassiev, Slawomir Lukiewski
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Jun 23 2017 11:00:20 AM UTC
Updated:: Jul 29 2017 04:23:01 PM UTC
Resolved:: Jun 23 2017 03:59:58 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates