[SERVER-2797] SEGV at BtreeCursor9prettyKey Created: 19/Mar/11  Updated: 12/Jul/16  Resolved: 20/Mar/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.8.0
Fix Version/s: 1.8.1, 1.9.0

Type: Bug Priority: Critical - P2
Reporter: Alvin Richards (Inactive) Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

1.8.0 EC2


Issue Links:
Depends
Operating System: ALL
Participants:

 Description   

Problem:
As reported in logs

Sat Mar 19 04:21:44 [conn224] splitVector doing another cycle because of force, keyCount now: 1678740
Sat Mar 19 04:21:45 Got signal: 11 (Segmentation fault).

Sat Mar 19 04:21:45 Backtrace:
0x8a5359 0x7fad81c02af0 0x89053d 0x893356 0x7dc5f0 0x7ddb21 0x645565 0x64addc 0x757f15 0x75a440 0x8a617e 0x8b92d0 0x7fad827069ca 0x7fad81cb570d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a5359]
/lib/libc.so.6(+0x33af0) [0x7fad81c02af0]
/usr/bin/mongod(_ZNK5mongo11BtreeCursor9prettyKeyERKNS_7BSONObjE+0x1d) [0x89053d]
/usr/bin/mongod(_ZN5mongo11SplitVector3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x2996) [0x893356]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x9a0) [0x7dc5f0]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7ddb21]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x645565]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x31ac) [0x64addc]
/usr/bin/mongod() [0x757f15]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b0) [0x75a440]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a617e]
/usr/bin/mongod(thread_proxy+0x80) [0x8b92d0]
/lib/libpthread.so.0(+0x69ca) [0x7fad827069ca]
/lib/libc.so.6(clone+0x6d) [0x7fad81cb570d]

Sat Mar 19 04:21:45 dbexit:
Sat Mar 19 04:21:45 [conn224] shutdown: going to close listening sockets...
Sat Mar 19 04:21:45 [conn224] closing listening socket: 6
Sat Mar 19 04:21:45 [conn224] closing listening socket: 7
Sat Mar 19 04:21:45 [conn224] closing listening socket: 8
Sat Mar 19 04:21:45 [conn224] closing listening socket: 9
Sat Mar 19 04:21:45 [conn224] removing socket file: /tmp/mongodb-27017.sock
Sat Mar 19 04:21:45 [conn224] removing socket file: /tmp/mongodb-28017.sock
Sat Mar 19 04:21:45 [conn224] shutdown: going to flush diaglog...
Sat Mar 19 04:21:45 [conn224] shutdown: going to close sockets...
Sat Mar 19 04:21:45 [conn224] shutdown: waiting for fs preallocator...
Sat Mar 19 04:21:45 [conn224] shutdown: closing all files...
Sat Mar 19 04:21:45 [conn292] end connection 10.120.9.214:45651
Sat Mar 19 04:21:45 [conn293] end connection 10.120.9.214:45652
Sat Mar 19 04:21:46 [conn4] end connection 10.122.99.3:40790
Sat Mar 19 04:21:46 [conn3] end connection 10.120.119.58:57884
52/78 66%
Sat Mar 19 04:21:47 [conn270] end connection 10.253.171.65:37712
71/78 91%
Sat Mar 19 04:21:49 closeAllFiles() finished
Sat Mar 19 04:21:49 [conn224] shutdown: removing fs lock...
Sat Mar 19 04:21:49 dbexit: really exiting now
Sat Mar 19 04:21:49 ERROR: Client::~Client _context should be null but is not; client:conn

Reproduce:
Not clear:

Workaround:
Startup the DB, it will re-join the set



 Comments   
Comment by auto [ 19/Mar/11 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-2797 fix BTreeCursor handling and ensure chunks aren't over sized
https://github.com/mongodb/mongo/commit/bf7de7316593927f0e4a8c6f600af7d3c1224981

Comment by auto [ 19/Mar/11 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: SERVER-2797 fix BTreeCursor handling and ensure chunks aren't over sized
https://github.com/mongodb/mongo/commit/7e593365957dac139531b0ede79d9f0df217f584

Comment by charso [ 19/Mar/11 ]

Hit this issue a second time on the primary of a different shard. This could become a serious problem for me if it keeps happening. Here's log output:

Sat Mar 19 08:29:22 [conn499] moveChunk request accepted at version 67|1
Sat Mar 19 08:29:39 [LockPinger] dist_lock pinged successfully for: ip-10-126-165-104:1300505683:8780230
Sat Mar 19 08:29:41 [conn499] warning: can't move chunk of size (aprox) 273557193 because maximum size allowed to move is 209715200 ns: production.sessions

{ _id: BinData }

->

{ _id: BinData }

Sat Mar 19 08:29:41 [conn499] about to log metadata event: { _id: "ip-10-126-165-104-2011-03-19T08:29:41-31", server: "ip-10-126-165-104", clientAddr: "10.253.134.167:43480", time: new Date(1300523381553), what: "moveChunk.from", ns: "production.sessions", details: { min:

{ _id: BinData }

, max:

{ _id: BinData }

, step1: 0, step2: 44, note: "aborted" } }
Sat Mar 19 08:29:41 [conn499] query admin.$cmd ntoreturn:1 command: { moveChunk: "production.sessions", from: "ProdShard1/ip-10-126-165-104.ec2.internal:27017,ip-10-127-86...", to: "ProdShard4/ip-10-126-141-193.ec2.internal:27017,ip-10-125-58...", min:

{ _id: BinData }

, max:

{ _id: BinData }

, maxChunkSizeBytes: 209715200, shardId: "production.sessions-_id_BinData(0, 95002FF87B7CCE622E14DEAE...", configdb: "ip-10-122-177-96.ec2.internal:27017,domU-12-31-38-01-C5-56.compute-1.i..." } reslen:116 19594ms
Sat Mar 19 08:29:41 [conn499] request split points lookup for chunk production.sessions { : BinData } -->> { : BinData }
Sat Mar 19 08:29:41 [conn499] splitVector doing another cycle because of force, keyCount now: 342803
Sat Mar 19 08:29:42 Got signal: 11 (Segmentation fault).

Sat Mar 19 08:29:42 Backtrace:
0x8a5359 0x7fa550a5baf0 0x89053d 0x893356 0x7dc5f0 0x7ddb21 0x645565 0x64addc 0x757f15 0x75a440 0x8a617e 0x8b92d0 0x7fa55155f9ca 0x7fa550b0e70d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a5359]
/lib/libc.so.6(+0x33af0) [0x7fa550a5baf0]
/usr/bin/mongod(_ZNK5mongo11BtreeCursor9prettyKeyERKNS_7BSONObjE+0x1d) [0x89053d]
/usr/bin/mongod(_ZN5mongo11SplitVector3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x2996) [0x893356]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x9a0) [0x7dc5f0]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7ddb21]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x645565]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x31ac) [0x64addc]
/usr/bin/mongod() [0x757f15]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b0) [0x75a440]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a617e]
/usr/bin/mongod(thread_proxy+0x80) [0x8b92d0]
/lib/libpthread.so.0(+0x69ca) [0x7fa55155f9ca]
/lib/libc.so.6(clone+0x6d) [0x7fa550b0e70d]

Server then runs through dbexit in a similar fashion to previous log paste.

Generated at Thu Feb 08 03:01:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.