[SERVER-12161] mongos crash when splitting chunks Created: 19/Dec/13  Updated: 21/Dec/13  Resolved: 21/Dec/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: igor lasic Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux, mongos, shard size 128Mb, rebalancing


Operating System: ALL
Participants:

 Description   

mongos thread exits with below message.

Mon Nov 18 00:12:01.342 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1a16dab44256a9e57af
Mon Nov 18 00:12:01.552 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:12:13.751 [Balancer] could not acquire lock 'balancer/cert02.colo:27017:1384274434:1804289383' (another update won)
Mon Nov 18 00:12:13.751 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' was not acquired.
Mon Nov 18 00:12:25.346 [conn1577] end connection 10.84.150.103:40380 (27 connections now open)
Mon Nov 18 00:12:25.347 [mongosMain] connection accepted from 10.84.150.103:41699 #1582 (28 connections now open)
Mon Nov 18 00:12:26.096 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1b96dab44256a9e57b1
Mon Nov 18 00:12:26.310 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:12:38.645 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1c66dab44256a9e57b2
Mon Nov 18 00:12:38.817 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:12:47.295 [conn1580] warning: splitChunk failed - cmd: { splitChunk: "reportDetail", keyPattern: { reportSectionId: 1.0 }, min: { reportSectionId: ObjectId('5289a19084ae0181adc44b55') }, max: { reportSectio\
nId: MaxKey }, from: "shard0002", splitKeys: [ { reportSectionId: ObjectId('5289a1ce84ae0181adc60afd') } ], shardId: "reportDetail-reportSectionId_ObjectId('5289a19084ae0181adc44b55')", configdb: "mongo-c01:27019,mongo-c\
02:27019,mongo-c03:27019" } result: { who: { _id: "ihm.reportDetail", process: "mongo-s03.colo:27017:1383798064:1279813172", state: 1, ts: ObjectId('5289a1cf608a604a9a4d9fc0'), when: new Date(1384751567010), who: "mongo-s03.\
colo:27017:1383798064:1279813172:conn4637:447277999", why: "split-{ reportSectionId: ObjectId('5289a19084ae0181adc44b55') }" }, ok: 0.0, errmsg: "the collection's metadata lock is taken" }
Mon Nov 18 00:12:47.872 [conn1574] ChunkManager: time to load chunks for reportDetail: 7ms sequenceNumber: 915 version: 2208|5||5277dd0001bc489f0c618b8d based on: 2208|3||5277dd0001bc489f0c618b8d
Mon Nov 18 00:12:47.877 [conn1574] autosplitted reportDetail shard: ns:ihm.reportDetailshard: shard0002:mongo-s03:27017lastmod: 2208|3||000000000000000000000000min: { reportSectionId: ObjectId('5289a19084ae0181adc44b55')\
 }max: { reportSectionId: MaxKey } on: { reportSectionId: ObjectId('5289a1ce84ae0181adc60afd') } (splitThreshold 60397977) (migrate suggested)
Mon Nov 18 00:12:47.887 [conn1574] moving chunk (auto): ns:ihm.reportDetailshard: shard0002:mongo-s03:27017lastmod: 2208|5||000000000000000000000000min: { reportSectionId: ObjectId('5289a1ce84ae0181adc60afd') }max: { reportS\
ectionId: MaxKey } to: shard0001:mongo-s02:27017
Mon Nov 18 00:12:47.887 [conn1574] moving chunk ns: reportDetail moving ( ns:ihm.reportDetailshard: shard0002:mongo-s03:27017lastmod: 2208|5||000000000000000000000000min: { reportSectionId: ObjectId('5289a1ce84ae0181adc6\
0afd') }max: { reportSectionId: MaxKey }) shard0002:mongo-s03:27017 -> shard0001:mongo-s02:27017
Mon Nov 18 00:12:48.537 [conn1574] moveChunk result: { cause: { ok: 0.0, errmsg: "still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 9" }, ok: 0.0, errmsg: "moveChunk failed to e\
ngage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 9" }
Mon Nov 18 00:12:48.581 [conn1574] Assertion: 10412:moveAndCommit failed: { cause: { ok: 0.0, errmsg: "still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 9" }, ok: 0.0, errmsg: "\
moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 9" }
0xa86db1 0xa4cbbb 0xa4d0fc 0x8c8bae 0x9c30ea 0x9c522c 0x994011 0x66a161 0xa7351e 0x3b73e07851 0x3b736e894d
 /usr/bin/mongos(_ZN5mongo15printStackTraceERSo+0x21) [0xa86db1]
 /usr/bin/mongos(_ZN5mongo11msgassertedEiPKc+0x9b) [0xa4cbbb]
 /usr/bin/mongos() [0xa4d0fc]
 /usr/bin/mongos(_ZNK5mongo5Chunk13splitIfShouldEl+0x23ee) [0x8c8bae]
 /usr/bin/mongos(_ZN5mongo13ShardStrategy7_insertERKSsRNS_9DbMessageEiRNS_7RequestE+0xdca) [0x9c30ea]
 /usr/bin/mongos(_ZN5mongo13ShardStrategy7writeOpEiRNS_7RequestE+0x53c) [0x9c522c]
 /usr/bin/mongos(_ZN5mongo7Request7processEi+0xd1) [0x994011]
 /usr/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x66a161]
 /usr/bin/mongos(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xa7351e]
 /lib64/libpthread.so.0() [0x3b73e07851]
 /lib64/libc.so.6(clone+0x6d) [0x3b736e894d]
Mon Nov 18 00:12:49.050 [conn1574] warning: could not autosplit collection ihm.reportDetail :: caused by :: 10412 moveAndCommit failed: { cause: { ok: 0.0, errmsg: "still waiting for a previous migrates data to get cleaned, \
can't accept new chunks, num threads: 9" }, ok: 0.0, errmsg: "moveChunk failed to engage TO-shard in the data transfer: still waiting for a previous migrates data to get cleaned, can't accept new chunks, num threads: 9" }
Mon Nov 18 00:12:49.290 [conn1580] ChunkManager: time to load chunks for ihm.reportDetail: 7ms sequenceNumber: 916 version: 2208|7||5277dd0001bc489f0c618b8d based on: 2208|5||5277dd0001bc489f0c618b8d
Mon Nov 18 00:12:49.302 [conn1580] autosplitted ihm.reportDetail shard: ns:ihm.reportDetailshard: shard0002:mongo-s03:27017lastmod: 2208|4||000000000000000000000000min: { reportSectionId: ObjectId('5289a19084ae0181adc44b55')\
 }max: { reportSectionId: ObjectId('5289a1ce84ae0181adc60afd') } on: { reportSectionId: ObjectId('5289a19184ae78734c136803') } (splitThreshold 67108864)
Mon Nov 18 00:12:51.182 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1d26dab44256a9e57b3
Mon Nov 18 00:12:51.458 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:12:57.753 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1d96dab44256a9e57b4
Mon Nov 18 00:12:57.990 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:04.310 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1e06dab44256a9e57b5
Mon Nov 18 00:13:04.548 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:10.830 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1e66dab44256a9e57b6
Mon Nov 18 00:13:11.034 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:17.385 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1ed6dab44256a9e57b7
Mon Nov 18 00:13:17.630 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:23.978 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1f36dab44256a9e57b8
Mon Nov 18 00:13:24.218 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:30.626 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a1fa6dab44256a9e57b9
Mon Nov 18 00:13:30.863 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:34.634 [LockPinger] cluster mongo-c01:27019,mongo-c02:27019,mongo-c03:27019 pinged successfully at Mon Nov 18 00:13:34 2013 by distributed lock pinger 'mongo-c01:27019,mongo-c02:27019,mongo-c03:27019/cert02.\
colo:27017:1384274434:1804289383', sleeping for 30000ms
Mon Nov 18 00:13:37.154 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a2006dab44256a9e57ba
Mon Nov 18 00:13:37.394 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:43.706 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' acquired, ts : 5289a2076dab44256a9e57bb
Mon Nov 18 00:13:43.950 [Balancer] distributed lock 'balancer/cert02.colo:27017:1384274434:1804289383' unlocked.
Mon Nov 18 00:13:55.105 [conn1566] end connection 192.168.110.26:56523 (27 connections now open)
Mon Nov 18 00:13:55.117 [mongosMain] connection accepted from 192.168.110.26:56595 #1583 (28 connections now open)



 Comments   
Comment by igor lasic [ 20/Dec/13 ]

OK so if it's transient then I am OK. Please close.

Comment by Eliot Horowitz (Inactive) [ 19/Dec/13 ]

I'm not seeing an actual crash here, just a transient assertion.
Does the process die at this point?
Any errors returned to client?

Generated at Thu Feb 08 03:27:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.