[SERVER-5055] dist_lock cleanup failed, invalid BSONObj 0xEEEEEEEE Created: 23/Feb/12 Updated: 09/Apr/13 Resolved: 22/Apr/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.8.3 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Site Operations | Assignee: | Scott Hernandez (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu |
||
| Participants: |
| Description |
|
We're getting an error on all of our mongos processes for one of our DBs: Thu Feb 23 10:06:02 [LockPinger] warning: dist_lock cleanup request from process: monitor01:1330020332:1804289383 to: mongoconfig01:27017,mongoconfig02:27017,mongoconfig03:27017 failed: Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383" We see the same error in different forms on other app servers: Thu Feb 23 10:18:05 [LockPinger] warning: dist_lock cleanup request from process: app01:1325044333:1804289383 to: mongoconfig01:27017,mongoconfig02:27017,mongoconfig03:27017 failed: Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383" This looks like something is corrupt in the distributed locking, and "monitor01:1320796130:1804289383" is failing to update. If we've restarted the mongos on monitor01, is there a way to manually clean that up, or to have mongo automatically fix that? |
| Comments |
| Comment by Mikhail Kulakovskiy [ 09/Apr/13 ] | ||||||||||||||||||
|
I had the same issue. I am running 2.4.1, 3 config servers, one was asserting with:
This was successfully resolved with running --repair on this config server. | ||||||||||||||||||
| Comment by Site Operations [ 10/Apr/12 ] | ||||||||||||||||||
|
Unable to reproduce at this point, we have made significant changes to our production environment and I believe this system has been superseded by a new installation. | ||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 09/Apr/12 ] | ||||||||||||||||||
|
Could you let us know whether you were able to perform the above operations and then attach the logs from that node if so? | ||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 26/Feb/12 ] | ||||||||||||||||||
|
Is it just one config server asserting? If so, I would:
Then please attach the logs from that node so we can take a look | ||||||||||||||||||
| Comment by Site Operations [ 24/Feb/12 ] | ||||||||||||||||||
|
3 config servers, none crashed that I know of, at least not recently. Configs are not running with journaling, but shards are. Guess we should fix that. Just saw a stacktrace with the most recent assert on the config server, pasted below: Thu Feb 23 17:44:52 [conn56] Assertion: 10334:Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383" , ping: { $lt: new Date(1329702292835) } } exception 10334 Invalid BSONObj size: -286331154 (0xEEEEEEEE) first element: _id: "monitor01:1320796130:1804289383" 3ms | ||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 24/Feb/12 ] | ||||||||||||||||||
|
How many config servers do you have? |