Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2805

Problems on chunk split with 1.8.0 rc2

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.8.0-rc2
    • Component/s: None
    • None
    • ALL
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      Fri Mar 18 18:05:05 [LockPinger] dist_lock pinged successfully for: OBE017.local:1300448993:1804289383
      Fri Mar 18 18:06:06 [mongosMain] connection accepted from 62.212.84.220:60590 #34
      Fri Mar 18 18:10:05 [LockPinger] dist_lock pinged successfully for: OBE017.local:1300448993:1804289383
      Fri Mar 18 18:15:05 [LockPinger] dist_lock pinged successfully for: OBE017.local:1300448993:1804289383
      Fri Mar 18 18:18:09 [conn24] autosplitted crawler.ad shard: ns:crawler.ad at: partitionB:partitionB/flash,gordon:27018 lastmod: 138|0 min:

      { _id: 662009843887 }

      max:

      { _id: 662009883156 }

      on:

      { _id: 10000068480361 }

      (splitThreshold 209715200)
      Fri Mar 18 18:18:09 [conn24] update failed b/c of StaleConfigException, retrying left:4 ns: crawler.ad query:

      { _id: 662009274178 }

      Fri Mar 18 18:18:11 [conn28] ns: crawler.ad ClusteredCursor::query ShardConnection had to change attempt: 0
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [flash:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [gordon:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [mara:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [flash:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [gordon:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [mara:27019]
      Fri Mar 18 18:18:11 [conn28] setShardVersion failed:

      { errmsg: "not master", ok: 0.0 }

      Fri Mar 18 18:18:11 [conn28] Assertion: 10429:setShardVersion failed!

      { "errmsg" : "not master", "ok" : 0 }

      0x522c19 0x69ee23 0x69e90b 0x69e90b 0x69e90b 0x5799e6 0x57a2f6 0x57781d 0x6680b6 0x67ae11 0x5817a2 0x6a26f0 0x30e920673d 0x30e8ad3f6d
      /opt/mongo/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x129) [0x522c19]
      /opt/mongo/bin/mongos [0x69ee23]
      /opt/mongo/bin/mongos [0x69e90b]
      /opt/mongo/bin/mongos [0x69e90b]
      /opt/mongo/bin/mongos [0x69e90b]
      /opt/mongo/bin/mongos(_ZN5boost6detail8function17function_invoker4IPFbRN5mongo12DBClientBaseERKSsbiEbS5_S7_biE6invokeERNS1_15function_bufferES5_S7_bi+0x16) [0x5799e6]
      /opt/mongo/bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x1c6) [0x57a2f6]
      /opt/mongo/bin/mongos(_ZN5mongo15ShardConnection25checkMyConnectionVersionsERKSs+0x3d) [0x57781d]
      /opt/mongo/bin/mongos(_ZN5mongo7Request7processEi+0x5e6) [0x6680b6]
      /opt/mongo/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE+0x141) [0x67ae11]
      /opt/mongo/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x272) [0x5817a2]
      /opt/mongo/bin/mongos(thread_proxy+0x80) [0x6a26f0]
      /lib64/libpthread.so.0 [0x30e920673d]
      /lib64/libc.so.6(clone+0x6d) [0x30e8ad3f6d]
      Fri Mar 18 18:18:11 [conn28] AssertionException in process: setShardVersion failed!

      { "errmsg" : "not master", "ok" : 0 }

      Fri Mar 18 18:18:11 [conn28] ns: crawler.ad ClusteredCursor::query ShardConnection had to change attempt: 0
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [flash:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [gordon:27019]
      Fri Mar 18 18:18:11 [conn28] SyncClusterConnection connecting to [mara:27019]
      Fri Mar 18 18:18:11 [conn28] setShardVersion failed:

      { errmsg: "not master", ok: 0.0 }

      Fri Mar 18 18:18:11 [conn28] Assertion: 10429:setShardVersion failed!

      { "errmsg" : "not master", "ok" : 0 }

      0x522c19 0x69ee23 0x69e90b 0x69e90b 0x69e90b 0x5799e6 0x57a2f6 0x57781d 0x6680b6 0x67ae11 0x5817a2 0x6a26f0 0x30e920673d 0x30e8ad3f6d
      /opt/mongo/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x129) [0x522c19]
      /opt/mongo/bin/mongos [0x69ee23]
      /opt/mongo/bin/mongos [0x69e90b]
      /opt/mongo/bin/mongos [0x69e90b]
      /opt/mongo/bin/mongos [0x69e90b]
      /opt/mongo/bin/mongos(_ZN5boost6detail8function17function_invoker4IPFbRN5mongo12DBClientBaseERKSsbiEbS5_S7_biE6invokeERNS1_15function_bufferES5_S7_bi+0x16) [0x5799e6]
      /opt/mongo/bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x1c6) [0x57a2f6]
      /opt/mongo/bin/mongos(_ZN5mongo15ShardConnection25checkMyConnectionVersionsERKSs+0x3d) [0x57781d]
      /opt/mongo/bin/mongos(_ZN5mongo7Request7processEi+0x5e6) [0x6680b6]

      ... ad infinitum. Until restarted.

      On partitionB mongod log I've found this around the same time:

      ...
      Fri Mar 18 18:18:09 [conn1195] end connection 95.211.11.30:33282
      Fri Mar 18 18:18:09 [conn1200] end connection 95.211.11.30:33292
      Fri Mar 18 18:18:09 [conn1366] end connection 95.211.11.30:49865
      Fri Mar 18 18:18:09 [initandlisten] connection accepted from 95.211.11.30:35538 #1498
      Fri Mar 18 18:18:09 [conn1498] request split points lookup for chunk crawler.ad { : 662009843887 } -->> { : 10000068480361 }
      Fri Mar 18 18:18:09 [initandlisten] connection accepted from 94.75.192.38:33921 #1499
      Fri Mar 18 18:18:09 [conn1499] end connection 94.75.192.38:33921
      Fri Mar 18 18:18:09 [conn1498] max number of requested split points reached (2) before the end of chunk crawler.ad { : 662009843887 } -->> { : 10000068480361 }
      Fri Mar 18 18:18:09 [conn1498] warning: Finding the split vector for crawler.ad over

      { _id: 1.0 }

      keyCount: 39243 numSplits: 2 lookedAt: 0 took 117ms
      Fri Mar 18 18:18:09 [conn1498] query admin.$cmd ntoreturn:1 command: { splitVector: "crawler.ad", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 662009843887 }

      , max:

      { _id: 10000068480361 }

      , maxChunkSizeBytes: 209715200, maxSplitPoints: 2, maxChunkObjects: 250000 } reslen:111 117ms
      Fri Mar 18 18:18:09 [conn1498] SyncClusterConnection connecting to [flash:27019]
      Fri Mar 18 18:18:09 [conn1498] SyncClusterConnection connecting to [gordon:27019]
      Fri Mar 18 18:18:09 [conn1498] SyncClusterConnection connecting to [mara:27019]
      Fri Mar 18 18:18:09 [conn1498] updated set (partitionB) to: partitionB/flash,gordon:27018
      Fri Mar 18 18:18:09 [initandlisten] connection accepted from 95.211.11.30:35543 #1500
      Fri Mar 18 18:18:09 [initandlisten] connection accepted from 95.211.11.30:35544 #1501
      Fri Mar 18 18:18:09 [conn1498] received splitChunk request: { splitChunk: "crawler.ad", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 662009843887 }

      , max:

      { _id: 10000068480361 }

      , from: "partitionB/flash,gordon:27018", splitKeys: [

      { _id: 662009883156 }

      ], shardId: "crawler.ad-_id_662009843887", configdb: "flash:27019,gordon:27019,mara:27019" }
      Fri Mar 18 18:18:09 [LockPinger] creating dist lock ping thread for: flash:27019,gordon:27019,mara:27019
      Fri Mar 18 18:18:09 [LockPinger] SyncClusterConnection connecting to [flash:27019]
      Fri Mar 18 18:18:09 [LockPinger] SyncClusterConnection connecting to [gordon:27019]
      Fri Mar 18 18:18:09 [LockPinger] SyncClusterConnection connecting to [mara:27019]
      Fri Mar 18 18:18:09 [conn1498] splitChunk accepted at version 138|0
      Fri Mar 18 18:18:09 [conn1498] about to log metadata event: { _id: "OBE017.local-2011-03-18T17:18:09-1", server: "OBE017.local", clientAddr: "95.211.11.30:35538", time: new Date(1300468689635), what: "split", ns: "crawler.ad", details: { before: { min:

      { _id: 662009843887 }

      , max:

      { _id: 10000068480361 }

      , lastmod: Timestamp 138000|0 }, left: { min:

      { _id: 662009843887 }

      , max:

      { _id: 662009883156 }

      , lastmod: Timestamp 138000|2 }, right: { min:

      { _id: 662009883156 }

      , max:

      { _id: 10000068480361 }

      , lastmod: Timestamp 138000|3 } } }
      Fri Mar 18 18:18:09 [conn1498] query admin.$cmd ntoreturn:1 command: { splitChunk: "crawler.ad", keyPattern:

      { _id: 1.0 }

      , min:

      { _id: 662009843887 }

      , max:

      { _id: 10000068480361 }

      , from: "partitionB/flash,gordon:27018", splitKeys: [

      { _id: 662009883156 }

      ], shardId: "crawler.ad-_id_662009843887", configdb: "flash:27019,gordon:27019,mara:27019" } reslen:53 487ms
      Fri Mar 18 18:19:00 [conn1191] insert crawler.ad 390ms
      Fri Mar 18 18:19:24 [conn159] query admin.$cmd ntoreturn:1 command:

      { writebacklisten: ObjectId('4d249c0f2daa327869f816c7') }

      reslen:60 300048ms
      Fri Mar 18 18:19:24 [conn163] query admin.$cmd ntoreturn:1 command:

      { writebacklisten: ObjectId('4d249b1d816216524426d17b') }

      reslen:60 300048ms
      Fri Mar 18 18:19:56 [conn1183] query admin.$cmd ntoreturn:1 command:

      { writebacklisten: ObjectId('4d8346e06ab0f7d672b0b2f2') }

      reslen:60 300048ms
      Fri Mar 18 18:20:25 [initandlisten] connection accepted from 94.75.192.38:33966 #1502
      Fri Mar 18 18:20:25 [conn1502] end connection 94.75.192.38:33966
      Fri Mar 18 18:20:34 [initandlisten] connection accepted from 94.75.192.38:33980 #1503
      Fri Mar 18 18:20:34 [conn1503] end connection 94.75.192.38:33980
      Fri Mar 18 18:21:54 [conn1187] insert crawler.ad 278ms
      Fri Mar 18 18:22:30 [conn5] query admin.$cmd ntoreturn:1 command:

      { writebacklisten: ObjectId('4d7f606edff5ce55c0758754') }

      reslen:60 300047ms
      Fri Mar 18 18:22:39 [LockPinger] dist_lock pinged successfully for: OBE017.local:1300468689:1182095420
      Fri Mar 18 18:22:52 [conn1190] insert crawler.ad 154ms
      ...
      Then, for some time (I think until mongos restart, but not sure) keeps saying this mixed with normal logs:

      Fri Mar 18 18:46:45 [conn1529] request split points lookup for chunk crawler.ad { : 10662019454956 } -->> { : MaxKey }
      Fri Mar 18 18:46:47 [conn1529] request split points lookup for chunk crawler.ad { : 10662019454956 } -->> { : MaxKey }
      Fri Mar 18 18:46:47 [conn1529] request split points lookup for chunk crawler.ad { : 10662019454956 } -->> { : MaxKey }
      Fri Mar 18 18:46:49 [conn1529] request split points lookup for chunk crawler.ad { : 10662019454956 } -->> { : MaxKey }
      ...

      But looks like no error appears anywhere on mongod logs (No error neither on config server logs)

      I'll try to use 1.8.0 instead of RC2, let's see.

      Post you if any change...

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            alvin Alvin Richards (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: