Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-6105

Dropping sharded collection and recreating it confuses mongos

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.3
    • Component/s: Sharding
    • Labels:
      None
    • Environment:
      Ubuntu on EC2
    • ALL

      I recently dropped a sharded collection, recreated it, and re-sharded it. Seems like mongos doesn't know how to handle that. Restarting mongos reloads the config and thus fixes the problem, but this seems to me like a bug.

      On the mongos logs I see these messages flying by at a high rate:

      Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] writeback failed because of stale config, retrying attempts: 16678
      Fri Jun 15 18:16:39 [conn4784] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54284 version: 1|0
      Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] writeback failed because of stale config, retrying attempts: 17967
      Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54285 version: 1|0
      Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Fri Jun 15 18:16:39 [WriteBackListener-mongo2.foobar.com:27018] ChunkManager: time to load chunks for pb3.hourly_stats: 1ms sequenceNumber: 54286 version: 1|0
      Fri Jun 15 18:16:39 [conn4776] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
      Fri Jun 15 18:16:39 [conn4784] setShardVersion failed host: mongo2.foobar.com:27018

      { oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }

      Fri Jun 15 18:16:39 [conn4784] Assertion: 10429:setShardVersion failed host: mongo2.foobar.com:27018

      { oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }

      0x5350c2 0x7f5f95 0x7f5790
      mongos(_ZN5mongo11msgassertedEiPKc+0x112) [0x5350c2]
      mongos() [0x7f5f95]
      mongos() [0x7f5790]
      Fri Jun 15 18:16:39 [conn4784] ~ScopedDBConnection: _conn != null
      Fri Jun 15 18:16:39 [conn4784] AssertionException while processing op type : 2002 to : pb3.hourly_stats :: caused by :: 10429 setShardVersion failed host: mongo2.foobar.com:27018

      { oldVersion: Timestamp 0|0, ns: "pb3.hourly_stats", version: Timestamp 448000|41, globalVersion: Timestamp 0|0, errmsg: "client version differs from config's for collection 'pb3.hourly_stats'", ok: 0.0 }

      Fri Jun 15 18:16:39 [WriteBackListener-mongo3.foobar.com:27018] writeback failed because of stale config, retrying attempts: 16679
      Fri Jun 15 18:16:39 [conn4776] ChunkManager: time to load chunks for pb3.hourly_stats: 2ms sequenceNumber: 54287 version: 1|0
      Fri Jun 15 18:16:39 [conn4783] created new distributed lock for pb3.hourly_stats on mongoconfig1.foobar.com:27019,mongoconfig2.foobar.com:27019,mongoconfig3.foobar.com:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )

      On the non-primary shards I see these messages flying by at a high rate:

      Fri Jun 15 18:31:17 [conn28000] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn27998] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn27998] end connection xxx.xxx.xxx.xxx:48064
      Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:51367 #28002
      Fri Jun 15 18:31:17 [conn27999] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn27999] end connection xxx.xxx.xxx.xxx:48065
      Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:51368 #28003
      Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28000] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28000] end connection xxx.xxx.xxx.xxx:46821
      Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:46825 #28004
      Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28004] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28001] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28001] end connection xxx.xxx.xxx.xxx:46823
      Fri Jun 15 18:31:17 [initandlisten] connection accepted from xxx.xxx.xxx.xxx:46826 #28005
      Fri Jun 15 18:31:17 [conn28002] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28004] no chunk for collection pb3.hourly_stats on shard shard0002
      Fri Jun 15 18:31:17 [conn28003] no chunk for collection pb3.hourly_stats on shard shard0002

      And on the primary shard I see a lot of connections being opened and closed, but nothing else.

            Assignee:
            Unassigned Unassigned
            Reporter:
            zacwitte Zac Witte
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: