[SERVER-1390] Server crashes when running PyMongo test suite Created: 09/Jul/10  Updated: 12/Jul/16  Resolved: 12/Jul/10

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 1.5.5

Type: Bug Priority: Critical - P2
Reporter: Michael Dirolf Assignee: Aaron Staple
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

This has only been happening for very recent (last few days or so) server builds. The stack trace I get before the crash is this:

[initandlisten] Fri Jul 9 17:47:48 connection accepted from 127.0.0.1:53721 #6
[conn5] Fri Jul 9 17:47:48 CMD: drop pymongo_test.test
[conn1] Fri Jul 9 17:47:48 pymongo_test.test Assertion failure pos != -2 db/clientcursor.cpp 171
0x100030fce 0x100035f48 0x1000700b3 0x100174a6a 0x100013199 0x100174c17 0x1000a25e6 0x1000a2add 0x1000acb62 0x1000ad2c7 0x1000adf9c 0x10024e3bc 0x10016fe16 0x100236576 0x100238027 0x100303031 0x100637404 0x7fff879348b6 0x7fff87934769
0 mongod 0x0000000100030fce _ZN5mongo12sayDbContextEPKc + 174
1 mongod 0x0000000100035f48 _ZN5mongo8assertedEPKcS1_j + 344
2 mongod 0x00000001000700b3 _ZN5mongo12ClientCursorD2Ev + 467
3 mongod 0x0000000100174a6a _ZN5boost6detail17sp_counted_impl_pIN5mongo12ClientCursorEE7disposeEv + 26
4 mongod 0x0000000100013199 _ZN5boost6detail12shared_countD1Ev + 73
5 mongod 0x0000000100174c17 _ZN5mongo11UserQueryOp16recoverFromYieldEv + 199
6 mongod 0x00000001000a25e6 _ZN5mongo12QueryPlanSet6Runner16recoverFromYieldERNS_7QueryOpE + 70
7 mongod 0x00000001000a2add _ZN5mongo12QueryPlanSet6Runner8mayYieldERKSt6vectorIN5boost10shared_ptrINS_7QueryOpEEESaIS6_EE + 221
8 mongod 0x00000001000acb62 _ZN5mongo12QueryPlanSet6Runner3runEv + 1218
9 mongod 0x00000001000ad2c7 _ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE + 71
10 mongod 0x00000001000adf9c _ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE + 124
11 mongod 0x000000010024e3bc _ZN5mongo16MultiPlanScanner5runOpERNS_7QueryOpE + 28
12 mongod 0x000000010016fe16 ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1 + 3318
13 mongod 0x0000000100236576 _ZN5mongoL13receivedQueryERNS_6ClientERNS_10DbResponseERNS_7MessageE + 230
14 mongod 0x0000000100238027 _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE + 4583
15 mongod 0x0000000100303031 _ZN5mongo10connThreadEv + 897
16 libboost_thread-mt.dylib 0x0000000100637404 thread_proxy + 132
17 libSystem.B.dylib 0x00007fff879348b6 _pthread_start + 331
18 libSystem.B.dylib 0x00007fff87934769 thread_start + 13
mongod(10199,0x100f10000) malloc: *** error for object 0x100b17470: pointer being freed was not allocated

      • set a breakpoint in malloc_error_break to debug
        Fri Jul 9 17:47:48 Got signal: 6 (Abort trap).
        Fri Jul 9 17:47:48 Backtrace:
        0x1002ffbc5 0x7fff8795b80a 0x10086ba00 0x100300232

Showed to Mathias but he was hoping somebody else could take a look. Can consistently reproduce on my machine just by running the PyMongo test suite. Narrowed it down to running these two tests in succession, in particular:

nosetests -v test.test_connection:TestConnection.test_network_timeout test.test_connection:TestConnection.test_tz_aware



 Comments   
Comment by Aaron Staple [ 12/Jul/10 ]

The problem was that ClientCursors were being stored in shared_ptr objects, which is bad since ClientCursor objects can get freed when a yield happens. This storage model was in place before my changes, but I think with the changes we are yielding more and this triggered the recent failures.

I added a ClientCursor pointer holder object and used that instead of shared_ptr.

Comment by auto [ 12/Jul/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-1390 don't put ClientCursor in shared_ptr
http://github.com/mongodb/mongo/commit/c6f48e0cd33ba3f6eea5b54397dac89544cfda6d

Comment by Eliot Horowitz (Inactive) [ 10/Jul/10 ]

I think this is from your change.
If not - assign back to me.

Generated at Thu Feb 08 02:56:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.