[SERVER-5055] dist_lock cleanup failed, invalid BSONObj 0xEEEEEEEE Created: 23/Feb/12  Updated: 09/Apr/13  Resolved: 22/Apr/12

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.3
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Site Operations Assignee: Scott Hernandez (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu


Participants:

 Description   

We're getting an error on all of our mongos processes for one of our DBs:

Thu Feb 23 10:06:02 [LockPinger] warning: dist_lock cleanup request from process: monitor01:1330020332:1804289383 to: mongoconfig01:27017,mongoconfig02:27017,mongoconfig03:27017 failed: Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383"

We see the same error in different forms on other app servers:

Thu Feb 23 10:18:05 [LockPinger] warning: dist_lock cleanup request from process: app01:1325044333:1804289383 to: mongoconfig01:27017,mongoconfig02:27017,mongoconfig03:27017 failed: Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383"

This looks like something is corrupt in the distributed locking, and "monitor01:1320796130:1804289383" is failing to update. If we've restarted the mongos on monitor01, is there a way to manually clean that up, or to have mongo automatically fix that?



 Comments   
Comment by Mikhail Kulakovskiy [ 09/Apr/13 ]

I had the same issue. I am running 2.4.1, 3 config servers, one was asserting with:

Tue Apr  9 15:46:22.134 [conn9] Assertion: 10334:BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: "PRFT254:27017:1364633171:6473960"
0xdc7f71 0xd8963b 0xd89b7c 0x6e8f49 0xa4c0bd 0xa4c9fb 0x805370 0xa72314 0x9eaf71 0x9f1046 0x6f0350 0xdb478e 0x7f0eef6e2e9a 0x7f0eee9f5cbd
 /usr/mongo/24/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
 /usr/mongo/24/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd8963b]
 /usr/mongo/24/mongod() [0xd89b7c]
 /usr/mongo/24/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x5a9) [0x6e8f49]
 /usr/mongo/24/mongod(_ZNK5mongo19CoveredIndexMatcher7matchesERKNS_7BSONObjERKNS_7DiskLocEPNS_12MatchDetailsEb+0x20d) [0xa4c0bd]
 /usr/mongo/24/mongod(_ZNK5mongo19CoveredIndexMatcher14matchesCurrentEPNS_6CursorEPNS_12MatchDetailsE+0xeb) [0xa4c9fb]
 /usr/mongo/24/mongod(_ZN5mongo11BtreeCursor14currentMatchesEPNS_12MatchDetailsE+0x100) [0x805370]
 /usr/mongo/24/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0x414) [0xa72314]
 /usr/mongo/24/mongod(_ZN5mongo14receivedDeleteERNS_7MessageERNS_5CurOpE+0x3a1) [0x9eaf71]
 /usr/mongo/24/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xae6) [0x9f1046]
 /usr/mongo/24/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90) [0x6f0350]
 /usr/mongo/24/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdb478e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f0eef6e2e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f0eee9f5cbd]
Tue Apr  9 15:46:22.141 [conn9] remove config.lockpings query: { _id: { $nin: [ "PRFT254:27017:1364839602:985250340", "PRFT296:27000:1364822766:1804289383", "PRFT316:27000:1364833444:1804289383", "PRFT316:27018:1364834119:1274150233", "mongo.ap:27000:1347446408:16807", "mongo2.ap:27000:1342801457:16807" ] }, ping: { $l
t: new Date(1365169581987) } } keyUpdates:0 exception: BSONObj size: -286331154 (0xEEEEEEEE) is invalid. Size must be between 0 and 16793600(16MB) First element: _id: "PRFT254:27017:1364633171:6473960" code:10334 locks(micros) w:6965 6ms

This was successfully resolved with running --repair on this config server.

Comment by Site Operations [ 10/Apr/12 ]

Unable to reproduce at this point, we have made significant changes to our production environment and I believe this system has been superseded by a new installation.

Comment by Ian Whalen (Inactive) [ 09/Apr/12 ]

Could you let us know whether you were able to perform the above operations and then attach the logs from that node if so?

Comment by Eliot Horowitz (Inactive) [ 26/Feb/12 ]

Is it just one config server asserting?

If so, I would:

  • shut it down
  • mongodump a good one
  • start on a different port
  • mongorestore
  • restart on correct port

Then please attach the logs from that node so we can take a look

Comment by Site Operations [ 24/Feb/12 ]

3 config servers, none crashed that I know of, at least not recently. Configs are not running with journaling, but shards are. Guess we should fix that.

Just saw a stacktrace with the most recent assert on the config server, pasted below:

Thu Feb 23 17:44:52 [conn56] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383"
0x55f39a 0x4ee2fe 0x74138c 0x652401 0x790fe6 0x799c46 0x79d2f1 0x7a0664 0x6526e1 0x65bdf6 0x647bfe 0x757a62 0x75d830 0x8ae1de 0x7fe9f22f39ca 0x7fe9f18a270d
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f39a]
/usr/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x46e) [0x4ee2fe]
/usr/bin/mongod(_ZN5mongo19CoveredIndexMatcher7matchesERKNS_7BSONObjERKNS_7DiskLocEPNS_12MatchDetailsEb+0x13c) [0x74138c]
/usr/bin/mongod(_ZN5mongo8DeleteOp4nextEv+0xa1) [0x652401]
/usr/bin/mongod(_ZN5mongo12QueryPlanSet6Runner6nextOpERNS_7QueryOpE+0x56) [0x790fe6]
/usr/bin/mongod(_ZN5mongo12QueryPlanSet6Runner3runEv+0x866) [0x799c46]
/usr/bin/mongod(_ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE+0x251) [0x79d2f1]
/usr/bin/mongod(_ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE+0x64) [0x7a0664]
/usr/bin/mongod(_ZN5mongo11MultiCursor10nextClauseEv+0x71) [0x6526e1]
/usr/bin/mongod(_ZN5mongo11MultiCursorC1EPKcRKNS_7BSONObjES5_N5boost10shared_ptrINS0_8CursorOpEEEb+0x1d6) [0x65bdf6]
/usr/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0x27e) [0x647bfe]
/usr/bin/mongod(_ZN5mongo14receivedDeleteERNS_7MessageERNS_5CurOpE+0x382) [0x757a62]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x1890) [0x75d830]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8ae1de]
/lib/libpthread.so.0(+0x69ca) [0x7fe9f22f39ca]
/lib/libc.so.6(clone+0x6d) [0x7fe9f18a270d]
Thu Feb 23 17:44:52 [conn56] remove config.lockpings query: { _id:

{ $nin: [ "monitor01:1330020332:1804289383", "mongoshard02:1330034739:1138494367", "mongoshard04:1325099694:1241126981", "app01:1320189087:1804289383", "app01:1320189087:1804289383", "app02:1323302676:1804289383" ] }

, ping:

{ $lt: new Date(1329702292835) }

} exception 10334 Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383" 3ms
Thu Feb 23 17:44:52 [conn56] fsync from getlasterror
Thu Feb 23 17:44:52 [conn1027] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383"
0x55f39a 0x4ee2fe 0x74138c 0x652401 0x790fe6 0x799c46 0x79d2f1 0x7a0664 0x6526e1 0x65bdf6 0x647bfe 0x757a62 0x75d830 0x8ae1de 0x7fe9f22f39ca 0x7fe9f18a270d
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x12a) [0x55f39a]
/usr/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x46e) [0x4ee2fe]
/usr/bin/mongod(_ZN5mongo19CoveredIndexMatcher7matchesERKNS_7BSONObjERKNS_7DiskLocEPNS_12MatchDetailsEb+0x13c) [0x74138c]
/usr/bin/mongod(_ZN5mongo8DeleteOp4nextEv+0xa1) [0x652401]
/usr/bin/mongod(_ZN5mongo12QueryPlanSet6Runner6nextOpERNS_7QueryOpE+0x56) [0x790fe6]
/usr/bin/mongod(_ZN5mongo12QueryPlanSet6Runner3runEv+0x866) [0x799c46]
/usr/bin/mongod(_ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE+0x251) [0x79d2f1]
/usr/bin/mongod(_ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE+0x64) [0x7a0664]
/usr/bin/mongod(_ZN5mongo11MultiCursor10nextClauseEv+0x71) [0x6526e1]
/usr/bin/mongod(_ZN5mongo11MultiCursorC1EPKcRKNS_7BSONObjES5_N5boost10shared_ptrINS0_8CursorOpEEEb+0x1d6) [0x65bdf6]
/usr/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0x27e) [0x647bfe]
/usr/bin/mongod(_ZN5mongo14receivedDeleteERNS_7MessageERNS_5CurOpE+0x382) [0x757a62]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x1890) [0x75d830]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8ae1de]
/lib/libpthread.so.0(+0x69ca) [0x7fe9f22f39ca]
/lib/libc.so.6(clone+0x6d) [0x7fe9f18a270d]

Comment by Eliot Horowitz (Inactive) [ 24/Feb/12 ]

How many config servers do you have?
Did any crash unexpectedly?
Are they running with journalling?

Generated at Thu Feb 08 03:07:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.