[SERVER-6105] Dropping sharded collection and recreating it confuses mongos Created: 15/Jun/12  Updated: 10/Jun/13  Resolved: 10/Jun/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Zac Witte Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu on EC2


Issue Links:
Duplicate
duplicates SERVER-4537 better protect all sharding admin ope... Closed
Operating System: ALL
Participants:

 Description   

I recently dropped a sharded collection, recreated it, and re-sharded it. Seems like mongos doesn't know how to handle that. Restarting mongos reloads the config and thus fixes the problem, but this seems to me like a bug.

On the mongos logs I see these messages flying by at a high rate:

Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] writeback failed because of stale config, retrying attempts: 16678
Fri Jun 15 18:16:39 [conn4784] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54284 version: 1|0
Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] writeback failed because of stale config, retrying attempts: 17967
Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54285 version: 1|0
Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] ChunkManager: time to load chunks for pb3.hourly_stats: 1ms sequenceNumber: 54286 version: 1|0
Fri Jun 15 18:16:39 [conn4776] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
Fri Jun 15 18:16:39 [conn4784] setShardVersion failed host: mongo2.foobar.com:27018

{ oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }

Fri Jun 15 18:16:39 [conn4784] Assertion: 10429:setShardVersion failed host: mongo2.foobar.com:27018

{ oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }

0x5350c2 0x7f5f95 0x7f5790
mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5350c2]
mongos() [0x7f5f95]
mongos() [0x7f5790]
Fri Jun 15 18:16:39 [conn4784] ~ScopedDBConnection: _conn != null
Fri Jun 15 18:16:39 [conn4784] AssertionException while processing op type : 2002 to : pb3.hourly_stats :: caused by :: 10429 setShardVersion failed host: mongo2.foobar.com:27018

{ oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }

Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] writeback failed because of stale config, retrying attempts: 16679
Fri Jun 15 18:16:39 [conn4776] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54287 version: 1|0
Fri Jun 15 18:16:39 [conn4783] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )

On the non-primary shards I see these messages flying by at a high rate:

Fri Jun 15 18:31:17 [conn28000] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn27998] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn27998] end connection xxx.xxx.xxx.xxx:48064
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:51367 #28002
Fri Jun 15 18:31:17 [conn27999] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn27999] end connection xxx.xxx.xxx.xxx:48065
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:51368 #28003
Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28000] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28000] end connection xxx.xxx.xxx.xxx:46821
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:46825 #28004
Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28004] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28001] end connection xxx.xxx.xxx.xxx:46823
Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:46826 #28005
Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28004] no chunk for collection pb3.hourly_stats on shard shard0002
Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002

And on the primary shard I see a lot of connections being opened and closed, but nothing else.


Generated at Thu Feb 08 03:10:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.