[SERVER-3008] CursorTimeoutTask failure during mongos shutdown Created: 27/Apr/11  Updated: 06/Sep/11  Resolved: 06/Sep/11

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andrew R Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: mongos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04


Operating System: Linux
Participants:

 Description   

I sent the TERM signal to mongos to shut it down gracefully, and I got the following stack trace in my mongos log:

Wed Apr 27 00:46:52 [mongosMain] dbexit: received signal 15 rc:0 received signal 15
Received signal 6
Backtrace: 0x8167d37 0xb7764400 0xb7764416 0xb74cf851 0xb74d2d42 0xb750711d 0xb7511321 0xb7512b78 0xb7515c9d 0xb771d741 0x82c9711 0x8137649 0x81396be 0x82fe2cd 0xb7608985 0xb757513e
/usr/local/bin/mongos(_ZN5mongo17printStackAndExitEi+0x77)[0x8167d37]
[0xb7764400]
[0xb7764416]
/lib/tls/i686/nosegneg/libc.so.6(gsignal+0x51)[0xb74cf851]
/lib/tls/i686/nosegneg/libc.so.6(abort+0x182)[0xb74d2d42]
/lib/tls/i686/nosegneg/libc.so.6(+0x6211d)[0xb750711d]
/lib/tls/i686/nosegneg/libc.so.6(+0x6c321)[0xb7511321]
/lib/tls/i686/nosegneg/libc.so.6(+0x6db78)[0xb7512b78]
/lib/tls/i686/nosegneg/libc.so.6(cfree+0x6d)[0xb7515c9d]
/usr/lib/libstdc++.so.6(_ZdlPv+0x21)[0xb771d741]
/usr/local/bin/mongos(_ZN5mongo17CursorTimeoutTaskD0Ev+0x61)[0x82c9711]
/usr/local/bin/mongos(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0x1e9)[0x8137649]
/usr/local/bin/mongos(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x7e)[0x81396be]
/usr/local/bin/mongos(thread_proxy+0x7d)[0x82fe2cd]
/lib/tls/i686/nosegneg/libpthread.so.0(+0x5985)[0xb7608985]
/lib/tls/i686/nosegneg/libc.so.6(clone+0x5e)[0xb757513e]
===

Mongos did not shutdown, the process is still running after this error. Sending another TERM has no effect other than this in the log:

Wed Apr 27 00:59:23 [websvr] dbexit: received signal 15 rc:0 received signal 15



 Comments   
Comment by Eliot Horowitz (Inactive) [ 06/Sep/11 ]

See SERVER-3082

Comment by Andrew R [ 28/Apr/11 ]

Another repeat of the cursor timeout stack:
http://pastie.org/private/cdxhq1bpzbjlpqomyv66a

It's trapping signal 6 (http://en.wikipedia.org/wiki/SIGABRT). Given the nature of SIGABRT, mongos should terminate after handling the signal, but it remains running.

Comment by Andrew R [ 27/Apr/11 ]

I also found this error on shutdown:
http://pastie.org/private/wx7qrtg7ogkv9ufgb15ha

The stack is a little different but close enough that it could be related.

Comment by Andrew R [ 27/Apr/11 ]

I've seen this several times on different instances. Most mongos processes will stop find, but ~1/4 will hit this when I stop them all (running about 8).

All I see in the logs before this is many lines like this:
Wed Apr 27 00:46:19 [LockPinger] dist_lock pinged successfully for: host1:1303422971:1804289383

Generated at Thu Feb 08 03:01:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.