[SERVER-10433] Segfault with lots of updates (sharded) Created: 05/Aug/13  Updated: 05/Aug/13  Resolved: 05/Aug/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Paul Ryan Assignee: Tad Marshall
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

SmartOS with a config server, a mongos server and three shards


Issue Links:
Duplicate
duplicates SERVER-8795 remapPrivateView: Solaris mmap() is n... Closed
Operating System: Solaris
Steps To Reproduce:
  1. Set up a sharded set with system with millions of small documents (turn of loadbalancing)
  2. Run a few million updates to this sharded set

Expected: Servers to stay up and accept updates
Actual: The segfault from the description occurs on one of the shards

Participants:

 Description   

Running a significant number of updates to a large set of very small documents on the configuration described in environment is resulting in a segfault. The segfault happens consistently but the shard affected is random.

Sun Aug  4 02:58:03 Invalid access at address: 0xfffffd7dceee0000 from thread: conn57
 
Sun Aug  4 02:58:03 Got signal: 11 (Segmentation Fault).
 
Sun Aug  4 02:58:03 Backtrace:
0xb331b8 0x7bd48b 0x7bd695 0xfffffd7fff1d7686 0xfffffd7fff1ca37c 0x9ff980 0x92d546 0x93183b 0x7cead0 0xb2539a 0xfffffd7fd97b364c 0xfffffd7fff1d72f4 0xfffffd7fff1d75c0 
 /opt/local/bin/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xb331b8]
 /opt/local/bin/mongod'_ZN5mongo10abruptQuitEi+0x11b [0x7bd48b]
 /opt/local/bin/mongod'_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x125 [0x7bd695]
 /lib/amd64/libc.so.1'__sighndlr+0x6 [0xfffffd7fff1d7686]
 /lib/amd64/libc.so.1'call_user_handler+0x2a4 [0xfffffd7fff1ca37c]
 /opt/local/bin/mongod'_ZNK5mongo6Record5touchEb+0x0 [0x9ff980]
 /opt/local/bin/mongod'_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4e6 [0x92d546]
 /opt/local/bin/mongod'_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xe9b [0x93183b]
 /opt/local/bin/mongod'_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x90 [0x7cead0]
 /opt/local/bin/mongod'_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x32a [0xb2539a]
 /opt/local/lib/libboost_thread.so.1.49.0'thread_proxy+0x6c [0xfffffd7fd97b364c]
 /lib/amd64/libc.so.1'_thrp_setup+0xbc [0xfffffd7fff1d72f4]
 /lib/amd64/libc.so.1'_lwp_start+0x0 [0xfffffd7fff1d75c0]



 Comments   
Comment by Tad Marshall [ 05/Aug/13 ]

Hi Paul,

I think this is SERVER-8795, which is fixed in version 2.5.1 and will be in the forthcoming 2.4.6-rc0.

The clue is the call to 'touch' on line 12 in your stack trace. When the private view of the memory-mapped file is remapped, there is a brief window of time when accesses to that memory are not valid. Adding locks around the remapping prevents accesses during this time, avoiding the segfault.

Tad

Generated at Thu Feb 08 03:23:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.