[SERVER-12528] SIGTERM can cause an fassert if we're actively replicating Created: 29/Jan/14  Updated: 06/Dec/22  Resolved: 27/Aug/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Joanna Cheng Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: datarepl3.2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS release 6.4 64bit; Openstack virtual server/KVM guest
MongoDB 2.4.8


Issue Links:
Depends
Duplicate
duplicates SERVER-25071 Ensure replication batch finishes bef... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Participants:

 Description   

Our init scripts currently send a SIGTERM when stopping mongod.

If we are actively replicating and the repl worker thread catches the SIGTERM we get a stacktrace like the following (on 2.4):

Thu Jan 23 03:45:08.862 [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
Thu Jan 23 03:45:08.863 [repl writer worker 1] ERROR: writer worker caught exception: interrupted at shutdown on: { ts: Timestamp 1390448708000|2, h: 107290241850708099, v: 2, op: "i", ns: "XXX.YYY", o: { _id: ObjectId('...'), urn: "ZZZ", dateUpdated: new Date(1390448708000) } }
Thu Jan 23 03:45:08.863 [repl writer worker 1]   Fatal Assertion 16360
0xde05e1 0xda03d3 0xc28f3c 0xdadf21 0xe28e69 0x3f79007851 0x3f78ce890d 
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde05e1]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xda03d3]
 /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc28f3c]
 /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdadf21]
 /usr/bin/mongod() [0xe28e69]
 /lib64/libpthread.so.0() [0x3f79007851]
 /lib64/libc.so.6(clone+0x6d) [0x3f78ce890d]
Thu Jan 23 03:45:08.870 [repl writer worker 1] 
 
***aborting after fassert() failure

Since fassert is not a graceful way of shutting mongod down - for example, it requires journal recovery on restart, and may not clear the lock file which would interfere with subsequent startup, and since "service restart" should be graceful, and since we provide the init script that uses SIGTERM to implement "service restart", this seems like a bug on our side.



 Comments   
Comment by Scott Hernandez (Inactive) [ 27/Aug/16 ]

We no longer interrupt replication writers during shutdown: see SERVER-25071 (and linked issues)

Comment by Eric Milkie [ 19/Sep/14 ]

We no longer write a strack trace to the log, so this issue is less acute to fix. It would still be desirable to make the shutdown cleaner.

Generated at Thu Feb 08 03:28:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.