|
Hi Hendrik.
Apologies for the delay in follow-up on this issue. Your last response suggested that you had found a workaround using splitChunk from a single mongos, and we did not have any additional logs to review.
SERVER-7790 which you mentioned does seem to be a similar issue, and the feature requests SERVER-4537 and SERVER-5160 may also be relevant for you to watch & upvvote.
FYI, in MongoDB 2.4.8+ there is now caching for dbhash results which should improve performance using multiple mongos: https://jira.mongodb.org/browse/SERVER-11021. If you have a large config database and multiple mongos, it's possible that long-running dbhash commands could have contributed to slowness and spurious "config servers differ" warnings.
I'm going to close this issue, but please feel free to reopen if you have new information to investigate.
Thanks,
Stephen
|
|
Hello,
sorry for not replying earlier, I was on vacation.
1. The change we made was to limit the splitChunk commands to one
application. Beforehand we had multiple servers, each with an application
and a mongos. Each application created collections in a sharded database,
sharded these collections and did splitChunk commands to do pre-splitting
before doing mass inserts. After the change we only had one application
creating, sharding and splitting the collections (and putting the
collections in a collection pool). The actual executed commands did not
change. I assume that not doing createCollection, shardCollection and
splitChunk in parallel massively fixed our problem with the segmentation
fault.
2. It is possible that single mongos did not run exactly the same version
but if so it was very close (like mainly 2.4.3 and one or two 2.4.2).
3. The "config servers x and y differ" messages occured all over the place
(every other minute for hours). This problem reappeared even after we
manually re-synced the config servers (several times) and restarted the
mongos instances. This was another reason we limited the creating,
sharding and splitting of the collections to one application since this
also fixed the config server problem.
4. I am not sure if we still have mongos logs from the time of the
segmentation fault since this was quite some time ago but I will look. It
will probably be hard to find the right mongos since we had like 6 or 7 of
them and only one would have issued the problematic splitChunk command.
Best,
Hendrik
|
|
While experiencing problems with sharding (more info: https://groups.google.com/forum/#!topic/mongodb-user/BF06C0yqtV4), I'm also seeing this. Here's a snippet from my mongos log:
Thu Jul 11 16:26:17.903 [conn468] warning: splitChunk failed - cmd: { splitChunk: "trends.dimension.language", keyPattern: { _id: 1.0 }, min: { _id: BinData }, max: { _id: MaxKey }, from: "rs4", splitKeys: [ { _id: BinData } ], shardId: "trends.dimension.language-_id_BinData(0, 000120130711)", configdb: "mongobig-1-1:27019,mongobig-2-1:27019,mongobig-3-1:27019" } result: { who: { _id: "trends.dimension.language", process: "mongobig-4-1:27018:1373558874:1854596173", state: 1, ts: ObjectId('51dedca979a3c5840b68bbe1'), when: new Date(1373559977876), who: "mongobig-4-1:27018:1373558874:1854596173:conn133:1732459270", why: "split-{ _id: BinData }" }, ok: 0.0, errmsg: "the collection's metadata lock is taken" }
|
Thu Jul 11 16:26:18.530 [conn470] ChunkManager: time to load chunks for trends.dimension.language: 0ms sequenceNumber: 141 version: 2|3||51dedb10e68a130889d33a14 based on: 2|1||51dedb10e68a130889d33a14
|
Thu Jul 11 16:26:18.530 [conn470] autosplitted trends.dimension.language shard: ns:trends.dimension.languageshard: rs4:rs4/mongobig-4-1:27018lastmod: 2|1||000000000000000000000000min: { _id: BinData }max: { _id: MaxKey } on: { _id: BinData } (splitThreshold 471859) (migrate suggested)
|
Thu Jul 11 16:26:18.535 [conn470] moving chunk (auto): ns:trends.dimension.languageshard: rs4:rs4/mongobig-4-1:27018lastmod: 2|3||000000000000000000000000min: { _id: BinData }max: { _id: MaxKey } to: rs5:rs5/mongobig-5-1:27018
|
Thu Jul 11 16:26:18.535 [conn470] moving chunk ns: trends.dimension.language moving ( ns:trends.dimension.languageshard: rs4:rs4/mongobig-4-1:27018lastmod: 2|3||000000000000000000000000min: { _id: BinData }max: { _id: MaxKey }) rs4:rs4/mongobig-4-1:27018 -> rs5:rs5/mongobig-5-1:27018
|
Thu Jul 11 16:26:18.549 [conn464] warning: splitChunk failed - cmd: { splitChunk: "trends.dimension.language", keyPattern: { _id: 1.0 }, min: { _id: BinData }, max: { _id: MaxKey }, from: "rs4", splitKeys: [ { _id: BinData } ], shardId: "trends.dimension.language-_id_BinData(0, 000120130711)", configdb: "mongobig-1-1:27019,mongobig-2-1:27019,mongobig-3-1:27019" } result: { who: { _id: "trends.dimension.language", process: "mongobig-4-1:27018:1373558874:1854596173", state: 1, ts: ObjectId('51dedcaa79a3c5840b68bbe3'), when: new Date(1373559978507), who: "mongobig-4-1:27018:1373558874:1854596173:conn149:504692606", why: "split-{ _id: BinData }" }, ok: 0.0, errmsg: "the collection's metadata lock is taken" }
|
Thu Jul 11 16:26:18.549 [conn466] warning: splitChunk failed - cmd: { splitChunk: "trends.dimension.language", keyPattern: { _id: 1.0 }, min: { _id: BinData }, max: { _id: MaxKey }, from: "rs4", splitKeys: [ { _id: BinData } ], shardId: "trends.dimension.language-_id_BinData(0, 000120130711)", configdb: "mongobig-1-1:27019,mongobig-2-1:27019,mongobig-3-1:27019" } result: { who: { _id: "trends.dimension.language", process: "mongobig-4-1:27018:1373558874:1854596173", state: 1, ts: ObjectId('51dedcaa79a3c5840b68bbe3'), when: new Date(1373559978507), who: "mongobig-4-1:27018:1373558874:1854596173:conn149:504692606", why: "split-{ _id: BinData }" }, ok: 0.0, errmsg: "the collection's metadata lock is taken" }
|
Thu Jul 11 16:26:18.625 [conn470] moveChunk result: { who: { _id: "trends.dimension.language", process: "mongobig-4-1:27018:1373558874:1854596173", state: 1, ts: ObjectId('51dedcaa79a3c5840b68bbe3'), when: new Date(1373559978507), who: "mongobig-4-1:27018:1373558874:1854596173:conn149:504692606", why: "split-{ _id: BinData }" }, ok: 0.0, errmsg: "the collection metadata could not be locked with lock migrate-{ _id: BinData }" }
|
Thu Jul 11 16:26:18.625 [conn470] Assertion: 10412:moveAndCommit failed: { who: { _id: "trends.dimension.language", process: "mongobig-4-1:27018:1373558874:1854596173", state: 1, ts: ObjectId('51dedcaa79a3c5840b68bbe3'), when: new Date(1373559978507), who: "mongobig-4-1:27018:1373558874:1854596173:conn149:504692606", why: "split-{ _id: BinData }" }, ok: 0.0, errmsg: "the collection metadata could not be locked with lock migrate-{ _id: BinData }" }
|
0xa82161 0xa46e8b 0xa473cc 0x8c7b5e 0x9bc29c 0x9c30ca 0x991db1 0x669891 0xa6e8ce 0x7f5fc5b4fe9a 0x7f5fc4e62ccd
|
/usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x21) [0xa82161]
|
/usr/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x9b) [0xa46e8b]
|
/usr/bin/mongos() [0xa473cc]
|
/usr/bin/mongos(_ZNK5mongo5Chunk13splitIfShouldEl+0x23ee) [0x8c7b5e]
|
/usr/bin/mongos(_ZN5mongo13ShardStrategy7_updateERKSsRKNS_7BSONObjES5_iRNS_7RequestERNS_9DbMessageEi+0x3bc) [0x9bc29c]
|
/usr/bin/mongos(_ZN5mongo13ShardStrategy7writeOpEiRNS_7RequestE+0x63a) [0x9c30ca]
|
/usr/bin/mongos(_ZN5mongo7Request7processEi+0xd1) [0x991db1]
|
/usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x669891]
|
/usr/bin/mongos(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xa6e8ce]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f5fc5b4fe9a]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f5fc4e62ccd]
|
|