[SERVER-25247] MongoDB primary node crashes after SIGINT (ctrl+c) Created: 25/Jul/16  Updated: 07/Dec/16  Resolved: 25/Aug/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.6, 3.2.7, 3.2.8
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jimmy [X] Assignee: David Hows
Resolution: Duplicate Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 15.10, AMD64


Attachments: HTML File Disassembly     HTML File SegvAnalysis     HTML File Stacktrace     HTML File StacktraceTop     HTML File ThreadStacktrace     Zip Archive mdiag.json.zip     HTML File mongo_log_at_the_time_of_crash    
Issue Links:
Duplicate
duplicates WT-2838 Don't free session handles on close i... Closed
Related
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

After sending SIGINT to the primary node of the replica set (3 node dev setup: primary, secondary, arbiter) mongod crashed with SIGSEGV in __wt_split_stash_discard_all().

Log from primary at the time of crash is in the attachment

SegvAnalysis from Apport:

Segfault happened at: 0x19e0bf0 <__wt_split_stash_discard_all+48>:	cmpq   $0x0,(%rbx)
PC (0x019e0bf0) ok
source "$0x0" ok
destination "(%rbx)" (0x7fba414a9357) not located in a known VMA region (needed writable region)!
Stack memory exhausted (SP below stack segment)

Stacktrace:

#0  0x00000000019e0bf0 in __wt_split_stash_discard_all ()
No symbol table info available.
#1  0x0000000001a078af in __wt_connection_close ()
No symbol table info available.
#2  0x00000000019fd1e0 in __conn_close ()
No symbol table info available.
#3  0x0000000001085db3 in mongo::WiredTigerKVEngine::cleanShutdown() ()
No symbol table info available.
#4  0x0000000000facdd8 in mongo::ServiceContextMongoD::shutdownGlobalStorageEngineCleanly() ()
No symbol table info available.
#5  0x0000000000cd1210 in mongo::exitCleanly(mongo::ExitCode) ()
No symbol table info available.
#6  0x000000000131b8c1 in mongo::(anonymous namespace)::signalProcessingThread() ()
No symbol table info available.
#7  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#8  0x00007fc9564e06aa in start_thread (arg=0x7fc9558c2700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc9558c2700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502700402432, 5874792978760040640, 0, 140734797417775, 140502700403136, 140734797418552, -5902567582192140096, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#9  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Thread stacktrace:

.
Thread 15 (Thread 0x7fc94c0af700 (LWP 2551)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000000001b363dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
No symbol table info available.
#2  0x00000000012b1b06 in mongo::ThreadPool::_consumeTasks() ()
No symbol table info available.
#3  0x00000000012b22b0 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
No symbol table info available.
#4  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#5  0x00007fc9564e06aa in start_thread (arg=0x7fc94c0af700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94c0af700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502540941056, 5874792978760040640, 0, 140502582899919, 140502540941760, 34713112, -5902551138372976448, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#6  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 14 (Thread 0x7fc94b0ad700 (LWP 2553)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000000001b363dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
No symbol table info available.
#2  0x00000000012b1b06 in mongo::ThreadPool::_consumeTasks() ()
No symbol table info available.
#3  0x00000000012b22b0 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
No symbol table info available.
#4  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#5  0x00007fc9564e06aa in start_thread (arg=0x7fc94b0ad700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94b0ad700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502524155648, 5874792978760040640, 0, 140502582899919, 140502524156352, 34713112, -5902562134562996032, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#6  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 13 (Thread 0x7fc94f8b6700 (LWP 2544)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000000001b363dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
No symbol table info available.
#2  0x0000000001273c48 in mongo::DeadlineMonitor<mongo::mozjs::MozJSImplScope>::deadlineMonitorThread() ()
No symbol table info available.
#3  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#4  0x00007fc9564e06aa in start_thread (arg=0x7fc94f8b6700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94f8b6700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502599689984, 5874792978760040640, 0, 140734797414975, 140502599690688, 140734797417680, -5902554450329632576, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#5  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 12 (Thread 0x7fc9518ba700 (LWP 2540)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x0000000000bb77f9 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
No symbol table info available.
#2  0x0000000000bbb2c7 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
No symbol table info available.
#3  0x0000000000bb7e9c in mongo::LockerImpl<false>::lockGlobal(mongo::LockMode, unsigned int) ()
No symbol table info available.
#4  0x0000000000bb7b6d in mongo::LockerImpl<false>::restoreLockState(mongo::Locker::LockSnapshot const&) ()
No symbol table info available.
#5  0x0000000001092240 in mongo::WiredTigerRecordStore::yieldAndAwaitOplogDeletionRequest(mongo::OperationContext*) ()
No symbol table info available.
#6  0x00000000010981a3 in mongo::(anonymous namespace)::WiredTigerRecordStoreThread::_deleteExcessDocuments() ()
No symbol table info available.
#7  0x0000000001098718 in mongo::(anonymous namespace)::WiredTigerRecordStoreThread::run() ()
No symbol table info available.
#8  0x00000000012ab0d0 in mongo::BackgroundJob::jobBody() ()
No symbol table info available.
#9  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#10 0x00007fc9564e06aa in start_thread (arg=0x7fc9518ba700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc9518ba700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502633260800, 5874792978760040640, 0, 140734797411743, 140502633261504, 140734797412240, -5902576434119737152, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#11 0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 11 (Thread 0x7fc94d8b2700 (LWP 2548)):
#0  0x00007fc9561d9e7d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0x00007fc9561d9d14 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
        ts = {tv_sec = 8, tv_nsec = 470999445}
        set = {__val = {65536, 0 <repeats 15 times>}}
        oset = {__val = {8405507, 140502566115200, 140502566115024, 67575832, 140827780, 140827776, 0, 0, 0, 69083232, 67951712, 140501265154048, 129257520, 129258768, 129257520, 2}}
        result = <optimized out>
#2  0x0000000001b377c9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) ()
No symbol table info available.
#3  0x0000000001326aee in mongo::sleepsecs(int) ()
No symbol table info available.
#4  0x00000000010a917b in mongo::TTLMonitor::run() ()
No symbol table info available.
#5  0x00000000012ab0d0 in mongo::BackgroundJob::jobBody() ()
No symbol table info available.
#6  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#7  0x00007fc9564e06aa in start_thread (arg=0x7fc94d8b2700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94d8b2700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502566119168, 5874792978760040640, 0, 140734797414959, 140502566119872, 140734797417680, -5902550050135637824, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#8  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 10 (Thread 0x7fc94c8b0700 (LWP 2550)):
#0  0x00007fc9561d9e7d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0x00007fc9561d9d14 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
        ts = {tv_sec = 2, tv_nsec = 813136127}
        set = {__val = {65536, 0 <repeats 15 times>}}
        oset = {__val = {8405507, 65122808, 140502549329567, 140502549329632, 140502549329600, 10190813, 140502549329632, 14, 65122560, 0, 140502549329664, 140502549329904, 140502549329856, 11499820, 67783952, 140502549329632}}
        result = <optimized out>
#2  0x0000000001b377c9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) ()
No symbol table info available.
#3  0x0000000001326aee in mongo::sleepsecs(int) ()
No symbol table info available.
#4  0x0000000000b14fd8 in mongo::ClientCursorMonitor::run() ()
No symbol table info available.
#5  0x00000000012ab0d0 in mongo::BackgroundJob::jobBody() ()
No symbol table info available.
#6  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#7  0x00007fc9564e06aa in start_thread (arg=0x7fc94c8b0700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94c8b0700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502549333760, 5874792978760040640, 0, 140734797414959, 140502549334464, 140734797417680, -5902552237347733312, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#8  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 9 (Thread 0x7fc94b8ae700 (LWP 2552)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x00000000012abb9e in mongo::(anonymous namespace)::PeriodicTaskRunner::run() ()
No symbol table info available.
#2  0x00000000012ab0d0 in mongo::BackgroundJob::jobBody() ()
No symbol table info available.
#3  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#4  0x00007fc9564e06aa in start_thread (arg=0x7fc94b8ae700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94b8ae700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502532548352, 5874792978760040640, 0, 140734797414911, 140502532549056, 140734797417680, -5902563233537752896, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#5  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 8 (Thread 0x7fc94a8ac700 (LWP 2554)):
#0  0x00007fc9561d9e7d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
No locals.
#1  0x00007fc9561d9d14 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:138
        ts = {tv_sec = 8, tv_nsec = 147663717}
        set = {__val = {65536, 0 <repeats 15 times>}}
        oset = {__val = {8405507, 0 <repeats 15 times>}}
        result = <optimized out>
#2  0x0000000001b377c9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) ()
No symbol table info available.
#3  0x0000000001326aee in mongo::sleepsecs(int) ()
No symbol table info available.
#4  0x00000000012bf49e in mongo::HostnameCanonicalizationWorker::_doWork() ()
No symbol table info available.
#5  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#6  0x00007fc9564e06aa in start_thread (arg=0x7fc94a8ac700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94a8ac700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502515762944, 5874792978760040640, 0, 140734797414943, 140502515763648, 140734797417680, -5902565429339782976, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#7  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 7 (Thread 0x7fc94f0b5700 (LWP 2545)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x0000000000e56b43 in mongo::RangeDeleter::doWork() ()
No symbol table info available.
#2  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#3  0x00007fc9564e06aa in start_thread (arg=0x7fc94f0b5700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94f0b5700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502591297280, 5874792978760040640, 0, 140734797415039, 140502591297984, 140734797417680, -5902553351354875712, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#4  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 6 (Thread 0x7fc94d0b1700 (LWP 2549)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000000001b363dc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
No symbol table info available.
#2  0x00000000012b1b06 in mongo::ThreadPool::_consumeTasks() ()
No symbol table info available.
#3  0x00000000012b22b0 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
No symbol table info available.
#4  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#5  0x00007fc9564e06aa in start_thread (arg=0x7fc94d0b1700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc94d0b1700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502557726464, 5874792978760040640, 0, 140502582899919, 140502557727168, 34713112, -5902548951160880960, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#6  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 5 (Thread 0x7fc946e99700 (LWP 2564)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000000001338af0 in asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::error_code const&) ()
No symbol table info available.
#2  0x0000000001338da1 in asio::detail::scheduler::run(std::error_code&) ()
No symbol table info available.
#3  0x000000000133cf3f in asio::io_service::run() ()
No symbol table info available.
#4  0x00000000013306e0 in asio_detail_posix_thread_function ()
No symbol table info available.
#5  0x00007fc9564e06aa in start_thread (arg=0x7fc946e99700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc946e99700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502454867712, 5874792978760040640, 0, 140502574506895, 140502454868416, 140502574507232, -5902538209447673664, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#6  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 4 (Thread 0x7fc9578d0cc0 (LWP 2530)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
No locals.
#1  0x00007fc9564e2cfd in __GI___pthread_mutex_lock (mutex=0x20b4340 <mongo::shutdownLock>) at ../nptl/pthread_mutex_lock.c:80
        __PRETTY_FUNCTION__ = "__pthread_mutex_lock"
        type = 0
        id = <optimized out>
#2  0x0000000000cd0eb5 in mongo::exitCleanly(mongo::ExitCode) ()
No symbol table info available.
#3  0x000000000096e054 in main ()
No symbol table info available.
.
Thread 3 (Thread 0x7fc934270700 (LWP 24993)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
No locals.
#1  0x0000000000ae5e77 in mongo::CappedInsertNotifier::_wait(std::unique_lock<std::mutex>&, unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000l> >) const ()
No symbol table info available.
#2  0x0000000000ae8699 in mongo::CappedInsertNotifier::wait(unsigned long, std::chrono::duration<long, std::ratio<1l, 1000000l> >) const ()
No symbol table info available.
#3  0x0000000000b5389f in mongo::GetMoreCmd::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&) ()
No symbol table info available.
#4  0x0000000000bc7f93 in mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
No symbol table info available.
#5  0x0000000000bc8e24 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
No symbol table info available.
#6  0x0000000000b24ce0 in mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
No symbol table info available.
#7  0x0000000000cd6a15 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) ()
No symbol table info available.
#8  0x00000000009b937c in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*) ()
No symbol table info available.
#9  0x00000000012c9645 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
No symbol table info available.
#10 0x00007fc9564e06aa in start_thread (arg=0x7fc934270700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc934270700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502140126976, 5874792978760040640, 0, 140734797414223, 140502140127680, 8388608, -5902779458297560896, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#11 0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 2 (Thread 0x7fc935575700 (LWP 2672)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
No locals.
#1  0x0000000001338af0 in asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::error_code const&) ()
No symbol table info available.
#2  0x0000000001338da1 in asio::detail::scheduler::run(std::error_code&) ()
No symbol table info available.
#3  0x000000000133cf3f in asio::io_service::run() ()
No symbol table info available.
#4  0x00000000013306e0 in asio_detail_posix_thread_function ()
No symbol table info available.
#5  0x00007fc9564e06aa in start_thread (arg=0x7fc935575700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc935575700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502160070400, 5874792978760040640, 0, 140502471648143, 8388608, 124960064, -5902776861452959552, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#6  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.
.
Thread 1 (Thread 0x7fc9558c2700 (LWP 2531)):
#0  0x00000000019e0bf0 in __wt_split_stash_discard_all ()
No symbol table info available.
#1  0x0000000001a078af in __wt_connection_close ()
No symbol table info available.
#2  0x00000000019fd1e0 in __conn_close ()
No symbol table info available.
#3  0x0000000001085db3 in mongo::WiredTigerKVEngine::cleanShutdown() ()
No symbol table info available.
#4  0x0000000000facdd8 in mongo::ServiceContextMongoD::shutdownGlobalStorageEngineCleanly() ()
No symbol table info available.
#5  0x0000000000cd1210 in mongo::exitCleanly(mongo::ExitCode) ()
No symbol table info available.
#6  0x000000000131b8c1 in mongo::(anonymous namespace)::signalProcessingThread() ()
No symbol table info available.
#7  0x0000000001b37830 in execute_native_thread_routine ()
No symbol table info available.
#8  0x00007fc9564e06aa in start_thread (arg=0x7fc9558c2700) at pthread_create.c:333
        __res = <optimized out>
        pd = 0x7fc9558c2700
        now = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140502700402432, 5874792978760040640, 0, 140734797417775, 140502700403136, 140734797418552, -5902567582192140096, -5902572547200155456}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = <optimized out>
        pagesize_m1 = <optimized out>
        sp = <optimized out>
        freesize = <optimized out>
        __PRETTY_FUNCTION__ = "start_thread"
#9  0x00007fc95621613d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
No locals.

Disassembly:

=> 0x19e0bf0 <__wt_split_stash_discard_all+48>:	cmpq   $0x0,(%rbx)
   0x19e0bf4 <__wt_split_stash_discard_all+52>:	je     0x19e0c01 <__wt_split_stash_discard_all+65>
   0x19e0bf6 <__wt_split_stash_discard_all+54>:	mov    %rbx,%rsi
   0x19e0bf9 <__wt_split_stash_discard_all+57>:	mov    %r14,%rdi
   0x19e0bfc <__wt_split_stash_discard_all+60>:	callq  0x1a47e50 <__wt_free_int>
   0x19e0c01 <__wt_split_stash_discard_all+65>:	add    $0x1,%r12
   0x19e0c05 <__wt_split_stash_discard_all+69>:	add    $0x18,%rbx
   0x19e0c09 <__wt_split_stash_discard_all+73>:	cmp    %r12,0x2e8(%r13)
   0x19e0c10 <__wt_split_stash_discard_all+80>:	ja     0x19e0bf0 <__wt_split_stash_discard_all+48>
   0x19e0c12 <__wt_split_stash_discard_all+82>:	mov    0x2e0(%r13),%rbx
   0x19e0c19 <__wt_split_stash_discard_all+89>:	test   %rbx,%rbx
   0x19e0c1c <__wt_split_stash_discard_all+92>:	je     0x19e0c2d <__wt_split_stash_discard_all+109>
   0x19e0c1e <__wt_split_stash_discard_all+94>:	lea    0x2e0(%r13),%rsi
   0x19e0c25 <__wt_split_stash_discard_all+101>:	mov    %r14,%rdi
   0x19e0c28 <__wt_split_stash_discard_all+104>:	callq  0x1a47e50 <__wt_free_int>
   0x19e0c2d <__wt_split_stash_discard_all+109>:	pop    %rbx

Also a note: crashed primary was secondary a few hours before, but transition to primary automatically (ex-primary didn't crash, just stepped down from primary by itself)



 Comments   
Comment by Jimmy [X] [ 25/Aug/16 ]

alexander.gorrod:
primary:

mongod --dbpath ../mongo_store --replSet rs100


secondary:

mongod --dbpath store --replSet rs100 --port 27018


arbiter:

mongod --dbpath arbiter --replSet rs100 --port 30000

rs.config():

/* 1 */
{
    "_id" : "rs100",
    "version" : 5,
    "protocolVersion" : NumberLong(1),
    "members" : [ 
        {
            "_id" : 0,
            "host" : "localhost:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1.0,
            "tags" : {},
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }, 
        {
            "_id" : 1,
            "host" : "localhost:27018",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1.0,
            "tags" : {},
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }, 
        {
            "_id" : 2,
            "host" : "localhost:30000",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1.0,
            "tags" : {},
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "getLastErrorModes" : {},
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        }
    }
}

Comment by David Hows [ 25/Aug/16 ]

Flagging as a duplicate of WT-2838

Comment by Alexander Gorrod [ 15/Aug/16 ]

We have committed a workaround for this issue in WT-2838 which will be available in MongoDB 3.4.

KarrotKake: We would appreciate if you can provide the additional information requested - it would help us isolate and fix the root cause.

Comment by David Hows [ 08/Aug/16 ]

Thanks Jimmy,

What configuration options are you using on the MongoDB instance?

Were you able to run the mdiag script?

Comment by Jimmy [X] [ 08/Aug/16 ]

Added diagnostic info.

1. Installed from http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.2 multiverse
2. All 3 members are running in a same VM. Reproduction is not reliable, but usually node crashes after running for a couple of days (with occasional read/writes).

Comment by David Hows [ 08/Aug/16 ]

KarrotKake, I've been trying to reproduce this locally and have had no luck. I'm running a 3 node replica set and killing members with SIGINT and have had no instances where this issue triggered. Can we take a step back and try to gather some more specifics to see if we can nail down a cause?

Can you please provide me with the following:

  1. The exact version of MongoDB including where you downloaded/installed from
  2. The full exact process you use to reproduce this issue, including any commands running on the system at the time and how regularly you can reproduce the issue

Please run mdiag.sh on the same primary using following command:

sudo bash mdiag.sh SUPPORT-1191

This command will run for about 5 minutes and create a diagnostic file /tmp/mdiag-$HOSTNAME.txt. Please attach the generated file to this ticket.

Note: Feel free to look at mdiag before running it. The mdiag script gathers a variety of detailed, low-level system information about the host it is run on. This information relates to both the hardware and software setup of the machine, and we often find it helps us to diagnose a wide range of problems with MongoDB deployments. If you wish to redact any of the information in the output of the mdiag script, feel free to do so before attaching it to Jira.

Hopefully between all of the above we can isolate potential causes for this issue and get them resolved.

Thanks,
David

Comment by Jimmy [X] [ 05/Aug/16 ]

Had the same issue again, but now both primary and secondary nodes crashed after receiving SIGINT in the same function __wt_split_stash_discard_all() and at the same instruction.

Log:

2016-08-05T11:59:39.209+0300 I CONTROL  [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
2016-08-05T11:59:39.209+0300 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
2016-08-05T11:59:39.226+0300 I REPL     [signalProcessingThread] Stopping replication applier threads
2016-08-05T11:59:39.788+0300 I STORAGE  [conn134] got request after shutdown()
2016-08-05T11:59:43.791+0300 W EXECUTOR [rsBackgroundSync] killCursors command task failed: CallbackCanceled: Callback canceled
2016-08-05T11:59:43.793+0300 I CONTROL  [signalProcessingThread] now exiting
2016-08-05T11:59:43.793+0300 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
2016-08-05T11:59:43.793+0300 I NETWORK  [signalProcessingThread] closing listening socket: 5
2016-08-05T11:59:43.793+0300 I NETWORK  [signalProcessingThread] closing listening socket: 6
2016-08-05T11:59:43.793+0300 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-27018.sock
2016-08-05T11:59:43.794+0300 I NETWORK  [signalProcessingThread] shutdown: going to flush diaglog...
2016-08-05T11:59:43.794+0300 I NETWORK  [signalProcessingThread] shutdown: going to close sockets...
2016-08-05T11:59:43.794+0300 I STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
2016-08-05T11:59:44.908+0300 F -        [signalProcessingThread] Invalid access at address: 0x8
2016-08-05T11:59:45.064+0300 F -        [signalProcessingThread] Got signal: 11 (Segmentation fault).
 
 0x131ce72 0x131bfc9 0x131c348 0x7f09d0d73d10 0x19e0bf0 0x1a078af 0x19fd1e0 0x1085db3 0xfacdd8 0xcd1210 0x131b8c1 0x1b37830 0x7f09d0d6a6aa 0x7f09d0aa013d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F1CE72","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F1BFC9"},{"b":"400000","o":"F1C348"},{"b":"7F09D0D63000","o":"10D10"},{"b":"400000","o":"15E0BF0","s":"__wt_split_stash_discard_all"},{"b":"400000","o":"16078AF","s":"__wt_connection_close"},{"b":"400000","o":"15FD1E0"},{"b":"400000","o":"C85DB3","s":"_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv"},{"b":"400000","o":"BACDD8","s":"_ZN5mongo20ServiceContextMongoD34shutdownGlobalStorageEngineCleanlyEv"},{"b":"400000","o":"8D1210","s":"_ZN5mongo11exitCleanlyENS_8ExitCodeE"},{"b":"400000","o":"F1B8C1"},{"b":"400000","o":"1737830","s":"execute_native_thread_routine"},{"b":"7F09D0D63000","o":"76AA"},{"b":"7F09D0999000","o":"10713D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.8", "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.2.0-42-generic", "version" : "#49-Ubuntu SMP Tue Jun 28 21:26:26 UTC 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A53FF676E1D627BD1D9B1BF524DEFA13B667EE83" }, { "b" : "7FFF5FB19000", "elfType" : 3, "buildId" : "78C36CE0C0D6CDEC7AFE1B82972EB34801592987" }, { "b" : "7F09D1CF0000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "96D927A52B6A405C147AC4D3F8A6F14CC31316BA" }, { "b" : "7F09D18AC000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "039AD0290D6DDCD62FFAAFF6D241FD313938E654" }, { "b" : "7F09D16A4000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "0370F7BC9F3A530FBB3D7918E67713E9BFF68FD8" }, { "b" : "7F09D14A0000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "7B3B05F668FF51BFFDF2B2B560934813C083A948" }, { "b" : "7F09D1198000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "10EED81FF44190C88FCD4D807248BE110352D5FC" }, { "b" : "7F09D0F81000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "0C3C07EE15CFA81346847A679E8444B876D9CC58" }, { "b" : "7F09D0D63000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "41D72FB9BBC5E6FCE5654DC0CF23BC614782B0DA" }, { "b" : "7F09D0999000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5CCD94C4E3483DF05BE240FF1FB8A3F53794CC6F" }, { "b" : "7F09D1F59000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "AFE4833057694750DE5F6F5D713F7CB6CC4F195A" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x131ce72]
 mongod(+0xF1BFC9) [0x131bfc9]
 mongod(+0xF1C348) [0x131c348]
 libpthread.so.0(+0x10D10) [0x7f09d0d73d10]
 mongod(__wt_split_stash_discard_all+0x30) [0x19e0bf0]
 mongod(__wt_connection_close+0x3AF) [0x1a078af]
 mongod(+0x15FD1E0) [0x19fd1e0]
 mongod(_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv+0x143) [0x1085db3]
 mongod(_ZN5mongo20ServiceContextMongoD34shutdownGlobalStorageEngineCleanlyEv+0x28) [0xfacdd8]
 mongod(_ZN5mongo11exitCleanlyENS_8ExitCodeE+0x390) [0xcd1210]
 mongod(+0xF1B8C1) [0x131b8c1]
 mongod(execute_native_thread_routine+0x20) [0x1b37830]
 libpthread.so.0(+0x76AA) [0x7f09d0d6a6aa]
 libc.so.6(clone+0x6D) [0x7f09d0aa013d]
-----  END BACKTRACE  -----

Comment by Jimmy [X] [ 25/Jul/16 ]

Yes, the node restared successfully, No ill effects were observed so far, however, it's hard to say what impact it may have, but it seems to be a potentially serious issue. The crash was observed on several previous versions as well.

Here's the log immediately after restart:

mongod --dbpath store --replSet rs100 --port 27018
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] MongoDB starting : pid=25925 port=27018 dbpath=store 64-bit host=<erased>
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] db version v3.2.8
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] git version: ed70e33130c977bda0024c125b56d159573dbaf0
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2d 9 Jul 2015
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] allocator: tcmalloc
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] modules: none
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] build environment:
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten]     distmod: ubuntu1404
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten]     distarch: x86_64
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten]     target_arch: x86_64
2016-07-25T10:11:06.706+0300 I CONTROL  [initandlisten] options: { net: { port: 27018 }, replication: { replSet: "rs100" }, storage: { dbPath: "store" } }
2016-07-25T10:11:06.765+0300 I -        [initandlisten] Detected data files in store created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2016-07-25T10:11:06.765+0300 W -        [initandlisten] Detected unclean shutdown - store/mongod.lock is not empty.
2016-07-25T10:11:06.765+0300 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2016-07-25T10:11:06.766+0300 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2016-07-25T10:11:07.318+0300 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2016-07-25T10:11:07.318+0300 I STORAGE  [initandlisten] The size storer reports that the oplog contains 731896 records totaling to 308568235 bytes
2016-07-25T10:11:07.326+0300 I STORAGE  [initandlisten] Sampling from the oplog between Jun 14 13:22:21:1 and Jul 25 09:35:41:2 to determine where to place markers for truncation
2016-07-25T10:11:07.326+0300 I STORAGE  [initandlisten] Taking 26 samples and assuming that each section of oplog contains approximately 278886 records totaling to 117578673 bytes
2016-07-25T10:11:07.586+0300 I STORAGE  [initandlisten] Placing a marker at optime Jul  2 16:36:24:4e9
2016-07-25T10:11:07.586+0300 I STORAGE  [initandlisten] Placing a marker at optime Jul  2 16:43:22:2f8
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] 
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] 
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2016-07-25T10:11:07.670+0300 I CONTROL  [initandlisten] 
2016-07-25T10:11:07.750+0300 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'store/diagnostic.data'
2016-07-25T10:11:07.750+0300 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2016-07-25T10:11:07.767+0300 I NETWORK  [initandlisten] waiting for connections on port 27018
2016-07-25T10:11:07.779+0300 W NETWORK  [ReplicationExecutor] Failed to connect to 127.0.0.1:27017, reason: errno:111 Connection refused
2016-07-25T10:11:07.781+0300 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "rs100", version: 5, protocolVersion: 1, members: [ { _id: 0, host: "localhost:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "localhost:27018", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "localhost:30000", arbiterOnly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 } } }
2016-07-25T10:11:07.781+0300 I REPL     [ReplicationExecutor] This node is localhost:27018 in the config
2016-07-25T10:11:07.781+0300 I REPL     [ReplicationExecutor] transition to STARTUP2
2016-07-25T10:11:07.781+0300 I REPL     [ReplicationExecutor] Starting replication applier threads
2016-07-25T10:11:07.781+0300 I REPL     [ReplicationExecutor] transition to RECOVERING
2016-07-25T10:11:07.782+0300 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:27017; HostUnreachable: Connection refused
2016-07-25T10:11:07.783+0300 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:27017; HostUnreachable: Connection refused
2016-07-25T10:11:07.783+0300 I REPL     [ReplicationExecutor] Error in heartbeat request to localhost:27017; HostUnreachable: Connection refused
2016-07-25T10:11:07.784+0300 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to localhost:30000
2016-07-25T10:11:07.784+0300 I REPL     [ReplicationExecutor] Member localhost:30000 is now in state ARBITER
2016-07-25T10:11:07.784+0300 I REPL     [ReplicationExecutor] transition to SECONDARY

Comment by Ramon Fernandez Marina [ 25/Jul/16 ]

Thanks for the detailed report KarrotKake. If I understand correctly, the behavior you describe is currently possible but has no ill effects (working on getting more details).

After the SIGSEGV, were you able to restart this node normally? What's the impact of this issue on your deployment?

Thanks,
Ramón.

Comment by Jimmy [X] [ 25/Jul/16 ]

BTW: the crash happens if the server was running for some time (a couple of days or so)

Generated at Thu Feb 08 04:08:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.