[SERVER-7786] Crash - Assertion: 10331:EOO Before end of object Created: 28/Nov/12  Updated: 08/Mar/13  Resolved: 05/Dec/12

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ran Tavory Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

uname -a
Linux mongo-08 3.2.0-33-virtual #52-Ubuntu SMP Thu Oct 18 16:48:37 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux


Operating System: ALL
Participants:

 Description   

Both primary and secondary crashed at the same time. No sharding.
They were under reasonable load I guess and the only thing that was going on that's not usual is the following command I was running from the shell.
This command did not immediately cause the crash, but it took about an hour until it did. And it's not reproducible (so I can't be sure that it's really related).

The commend:
rs_totango:PRIMARY> while (true) {db.Session.find(

{createdAt: null}

).limit(20).forEach(function(s){db.Session.remove({_id: s._id})}); if(Math.random()<0.2)

{print(db.Session.stats().size);}

sleep(1000);}

PRIMARY logs:
============
Wed Nov 28 08:25:22 [conn3539] end connection 10.140.9.180:38407 (210 connections now open)
Wed Nov 28 08:25:22 [initandlisten] connection accepted from 10.140.9.180:38408 #3540 (211 connections now open)
Wed Nov 28 08:25:22 [conn3540] recv(): message len 0 is too large0
Wed Nov 28 08:25:22 [conn3540] end connection 10.140.9.180:38408 (210 connections now open)
Wed Nov 28 08:25:22 [initandlisten] connection accepted from 10.140.9.180:38409 #3541 (211 connections now open)
Wed Nov 28 08:25:22 [conn3541] recv(): message len 0 is too large0
Wed Nov 28 08:25:22 [conn3541] end connection 10.140.9.180:38409 (210 connections now open)
Wed Nov 28 08:25:22 [initandlisten] connection accepted from 10.140.9.180:38410 #3542 (211 connections now open)
Wed Nov 28 08:25:22 [conn1634] remove totango_prod.Session query:

{ _id: ObjectId('5036bce3e4b04e02ff19524f') }

keyUpdates:0 locks(micros) w:182610 182ms
Wed Nov 28 08:25:22 [conn3542] recv(): message len 0 is too large0
Wed Nov 28 08:25:22 [conn3542] end connection 10.140.9.180:38410 (210 connections now open)
Wed Nov 28 08:25:22 [initandlisten] connection accepted from 10.140.9.180:38411 #3543 (211 connections now open)
Wed Nov 28 08:25:22 Invalid access at address: 0x7ee4b725a000 from thread: conn3543

Wed Nov 28 08:25:22 Got signal: 11 (Segmentation fault).

Wed Nov 28 08:25:23 Backtrace:
0xade6e1 0x5582d9 0x558862 0x7f1dedb20cb0 0x7f1dece98038 0x610678 0x93070b 0x93d90b 0x944669 0x944f72 0x9456c7 0x880697 0x8810c6 0x849cfa 0x84b258 0x84b41b 0x8d0464 0x65c6b2 0x6571f1 0x658e87
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xade6e1]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5582d9]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x558862]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f1dedb20cb0]
/lib/x86_64-linux-gnu/libc.so.6(+0x15e038) [0x7f1dece98038]
/usr/bin/mongod(ZN5mongo20compareElementValuesERKNS_11BSONElementES2+0x748) [0x610678]
/usr/bin/mongod(_ZN5mongo10FieldRange9intersectERKS0_b+0x12b) [0x93070b]
/usr/bin/mongod(_ZN5mongo13FieldRangeSet19intersectMatchFieldEPKcRKNS_11BSONElementEbb+0x5b) [0x93d90b]
/usr/bin/mongod(_ZN5mongo13FieldRangeSet16handleMatchFieldERKNS_11BSONElementEb+0x1b9) [0x944669]
/usr/bin/mongod(_ZN5mongo13FieldRangeSet4initEb+0xa2) [0x944f72]
/usr/bin/mongod(_ZN5mongo13FieldRangeSetC1EPKcRKNS_7BSONObjEbb+0x147) [0x9456c7]
/usr/bin/mongod(ZN5mongo16MultiPlanScanner4initERKNS_7BSONObjES3_S3+0x87) [0x880697]
/usr/bin/mongod(ZN5mongo16MultiPlanScanner4makeEPKcRKNS_7BSONObjES5_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES5_NS_18QueryPlanGenerator18RecordedPlanPolicyES5_S5+0x76) [0x8810c6]
/usr/bin/mongod(_ZN5mongo15CursorGenerator19setMultiPlanScannerEv+0xea) [0x849cfa]
/usr/bin/mongod(_ZN5mongo15CursorGenerator8generateEv+0x98) [0x84b258]
/usr/bin/mongod(_ZN5mongo25NamespaceDetailsTransient9getCursorEPKcRKNS_7BSONObjES5_RKNS_24QueryPlanSelectionPolicyEPbRKN5boost10shared_ptrIKNS_11ParsedQueryEEEbPNS_16QueryPlanSummaryE+0x4b) [0x84b41b]
/usr/bin/mongod(_ZN5mongo8runCountEPKcRKNS_7BSONObjERSsRi+0x494) [0x8d0464]
/usr/bin/mongod(_ZN5mongo8CmdCount3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x62) [0x65c6b2]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRNS_14BSONObjBuilderEb+0x51) [0x6571f1]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xe27) [0x658e87]

SECONDARY logs:
===============
Wed Nov 28 08:25:31 [conn154] end connection 10.140.9.180:45949 (98 connections now open)
Wed Nov 28 08:25:31 [conn153] end connection 10.140.9.180:45948 (98 connections now open)
Wed Nov 28 08:25:31 [conn149] end connection 10.140.9.180:45944 (96 connections now open)
Wed Nov 28 08:25:31 [initandlisten] connection accepted from 10.140.9.180:45950 #155 (97 connections now open)
Wed Nov 28 08:25:31 [conn155] recv(): message len 0 is too large0
Wed Nov 28 08:25:31 [conn155] end connection 10.140.9.180:45950 (96 connections now open)
Wed Nov 28 08:25:31 [initandlisten] connection accepted from 10.140.9.180:45951 #156 (97 connections now open)
Wed Nov 28 08:25:31 [conn156] recv(): message len 4 is too large4
Wed Nov 28 08:25:31 [conn156] end connection 10.140.9.180:45951 (96 connections now open)
Wed Nov 28 08:25:31 [initandlisten] connection accepted from 10.140.9.180:45952 #157 (97 connections now open)
Wed Nov 28 08:25:32 [conn150] Assertion: 10331:EOO Before end of object
0xaf8c41 0xabedb9 0x574032 0x572a2b 0x65aaaf 0x7b217a 0x56fe42 0xae6ed1 0x7f7e2bbf2e9a 0x7f7e2af07dbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x99) [0xabedb9]
/usr/bin/mongod() [0x574032]
/usr/bin/mongod(_ZNK5mongo11BSONElement8toStringERNS_17StringBuilderImplINS_16TrivialAllocatorEEEbbi+0x69b) [0x572a2b]
/usr/bin/mongod(_ZNK5mongo7OpDebug6reportERKNS_5CurOpE+0xccf) [0x65aaaf]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x43a) [0x7b217a]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x82) [0x56fe42]
/usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x411) [0xae6ed1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7e2bbf2e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7e2af07dbd]
Wed Nov 28 08:25:32 [conn150] AssertionException handling request, closing client connection: 10331 EOO Before end of object
Wed Nov 28 08:25:32 [conn151] Assertion: 10331:EOO Before end of object
0xaf8c41 0xabedb9 0x574032 0x572a2b 0x65aaaf 0x7b217a 0x56fe42 0xae6ed1 0x7f7e2bbf2e9a 0x7f7e2af07dbd
Wed Nov 28 08:25:32 Invalid access at address: 0x7f73f77bc000 from thread: conn152

Wed Nov 28 08:25:32 Got signal: 11 (Segmentation fault).

Wed Nov 28 08:25:32 Backtrace:
0xaf8c41 0x5586c9 0x558c52 0x7f7e2bbfacb0 0x7f7e2af7201e 0x5c2b38 0x8ef29b 0x8fc49b 0x9031f9 0x903b02 0x904257 0x8d8b77 0x8d95a6 0x8e1faa 0x8e3508 0x8e36cb 0x8291a4 0x6f0622 0x6e8321 0x6e9cb0
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x5586c9]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x558c52]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f7e2bbfacb0]
/lib/x86_64-linux-gnu/libc.so.6(+0x15e01e) [0x7f7e2af7201e]
/usr/bin/mongod(ZN5mongo20compareElementValuesERKNS_11BSONElementES2+0x778) [0x5c2b38]
/usr/bin/mongod(_ZN5mongo10FieldRange9intersectERKS0_b+0x12b) [0x8ef29b]
/usr/bin/mongod(_ZN5mongo13FieldRangeSet19intersectMatchFieldEPKcRKNS_11BSONElementEbb+0x5b) [0x8fc49b]
/usr/bin/mongod(_ZN5mongo13FieldRangeSet16handleMatchFieldERKNS_11BSONElementEb+0x1b9) [0x9031f9]
/usr/bin/mongod(_ZN5mongo13FieldRangeSet4initEb+0xa2) [0x903b02]
/usr/bin/mongod(_ZN5mongo13FieldRangeSetC1EPKcRKNS_7BSONObjEbb+0x147) [0x904257]
/usr/bin/mongod(ZN5mongo16MultiPlanScanner4initERKNS_7BSONObjES3_S3+0x87) [0x8d8b77]
/usr/bin/mongod(ZN5mongo16MultiPlanScanner4makeEPKcRKNS_7BSONObjES5_RKN5boost10shared_ptrIKNS_11ParsedQueryEEES5_NS_18QueryPlanGenerator18RecordedPlanPolicyES5_S5+0x76) [0x8d95a6]
/usr/bin/mongod(_ZN5mongo15CursorGenerator19setMultiPlanScannerEv+0xea) [0x8e1faa]
/usr/bin/mongod(_ZN5mongo15CursorGenerator8generateEv+0x98) [0x8e3508]
/usr/bin/mongod(_ZN5mongo25NamespaceDetailsTransient9getCursorEPKcRKNS_7BSONObjES5_RKNS_24QueryPlanSelectionPolicyEPbRKN5boost10shared_ptrIKNS_11ParsedQueryEEEbPNS_16QueryPlanSummaryE+0x4b) [0x8e36cb]
/usr/bin/mongod(_ZN5mongo8runCountEPKcRKNS_7BSONObjERSsRi+0x494) [0x8291a4]
/usr/bin/mongod(_ZN5mongo8CmdCount3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x62) [0x6f0622]
/usr/bin/mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRNS_14BSONObjBuilderEb+0x51) [0x6e8321]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xe70) [0x6e9cb0]

/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaf8c41]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x99) [0xabedb9]
/usr/bin/mongod() [0x574032]
/usr/bin/mongod(_ZNK5mongo11BSONElement8toStringERNS_17StringBuilderImplINS_16TrivialAllocatorEEEbbi+0x69b) [0x572a2b]
/usr/bin/mongod(_ZNK5mongo7OpDebug6reportERKNS_5CurOpE+0xccf) [0x65aaaf]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x43a) [0x7b217a]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x82) [0x56fe42]
/usr/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x411) [0xae6ed1]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f7e2bbf2e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7e2af07dbd]



 Comments   
Comment by Eric Milkie [ 28/Nov/12 ]

The Java driver versions 2.9.0 and 2.9.1 definitely have this issue, so you should upgrade. 2.9.2 is only a bugfix release while 2.10.0 contains new features as well. Since you are only looking to fix the bug and not looking for new features, I would recommend upgrading to 2.9.2.
Based on the circumstances, I'm confident that it was the driver issue that caused the crash and not some rogue connection.

Comment by Ran Tavory [ 28/Nov/12 ]

I can't find anything relevant on the web server log, sorry. Either this was not reported as an error by the driver or that the log service was not configured properly but anyway, there's no info about this incident there

The java driver's version is 2.9.0. I'll upgrade (but do you think it's a reasonable explanation?). BTW, is there a reason to use 2.9.2 and not 2.10.0?

If there was a program that connected to the port and sent garbage which caused the db to crash - that's possible, yes, but it does not ring a bell, I mean, I'm not aware of any such program that might do that. But of course shit happens, so this is possible. Either way, I think it's a reasonable expectation that mongo would handle that gracefully, and you can think of it even as a security bug - if an attacker is able to crash the db by sending malformed input then...

Comment by Eric Milkie [ 28/Nov/12 ]

Also, can you check the version of the Java driver there? There was a concurrency issue with an older version that has been fixed in the latest version (2.9.2 or 2.10.0)

Comment by Eric Milkie [ 28/Nov/12 ]

The server doesn't do wire protocol checking unless you turn on --objcheck (this will negatively impact performance). There are some outstanding feature requests to sanitize the incoming data stream without performance impacts.
In situations like these, it is typically some program error (connecting on the wrong port) rather than a network issue. Do you have logs of what happened on the web server?

Comment by Ran Tavory [ 28/Nov/12 ]

no, 10.140.9.180 is one of the web servers. It users the mongo java
drivers.
So if the packets were corrupted, then I suppose it's a network issue
(we're on aws). But still I would not expect a crash...


/Ran
http://tavory.com

Comment by Eric Milkie [ 28/Nov/12 ]

The servers shut down because some new connections coming from 10.140.9.180 at that time did not speak proper MongoDB wire protocol. Was this the IP address of the computer where you were running the command from the shell?

Generated at Thu Feb 08 03:15:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.