[SERVER-63532] Certain operations are taking more than 10 seconds to interrupt Created: 10/Feb/22  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Morilha (Inactive) Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 1
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Service Arch
Operating System: ALL
Participants:
Linked BF Score: 48

 Description   

While analyzing tickets SERVER-62402, SERVER-63198 and BF-24143. I can consistently reproduce the scenario where few operations hang at mongo::CappedInsertNotifier::waitUntil by running jstests/replsets/force_shutdown_primary.js and setting an overall shutdown timeout shorter than 30 seconds.

Here's a stacktrace from one of these threads:

(gdb) t 76                                                                                                                                                                                        [Switching to thread 76 (Thread 0x7f0d29391700 (LWP 29864))]
#0  0x00007f0d7c915065 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f0d7c915065 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f0d7e1d81e5 in __gthread_cond_timedwait (__cond=0x7f0d4a7b1030, __mutex=0x7f0d5279f0b0, __abs_timeout=0x7f0d2938bb20) at /opt/mongodbtoolchain/revisions/97dc5840fc91c99e296fb3406abb87
07f4c2ccc3/stow/gcc-v3.Kbm/lib/gcc/x86_64-mongodb-linux/8.5.0/../../../../include/c++/8.5.0/x86_64-mongodb-linux/bits/gthr-default.h:871
#2  0x00007f0d7e204476 in std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0d4a7b1030, __lock=..., __atime=...) at /opt/mongodbtoo
lchain/revisions/97dc5840fc91c99e296fb3406abb8707f4c2ccc3/stow/gcc-v3.Kbm/lib/gcc/x86_64-mongodb-linux/8.5.0/../../../../include/c++/8.5.0/condition_variable:178
#3  0x00007f0d7e2043c5 in std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (this=0x7f0d4a7b1030, __lock=..., __atime=...) at /opt/mongodbtoolchain/
revisions/97dc5840fc91c99e296fb3406abb8707f4c2ccc3/stow/gcc-v3.Kbm/lib/gcc/x86_64-mongodb-linux/8.5.0/../../../../include/c++/8.5.0/condition_variable:106
#4  0x00007f0d7b3d69cc in std::_V2::condition_variable_any::wait_until<std::unique_lock<mongo::latch_detail::Latch>, std::chrono::_V2::system_clock, std::chrono::duration<long, std::ratio<1l, 10
00000000l> > > (this=0x7f0d4a7b1030, __lock=..., __atime=...) at /opt/mongodbtoolchain/revisions/97dc5840fc91c99e296fb3406abb8707f4c2ccc3/stow/gcc-v3.Kbm/lib/gcc/x86_64-mongodb-linux/8.5.0/../..
/../../include/c++/8.5.0/condition_variable:286
#5  0x00007f0d648b28ff in mongo::CappedInsertNotifier::waitUntil (this=0x7f0d4a7b1030, prevVersion=23, deadline=...) at src/mongo/db/catalog/collection.cpp:53



 Comments   
Comment by Daniel Morilha (Inactive) [ 25/Feb/22 ]

Hi louis.williams, thanks for adding updates to this ticket, I am working on a strategy to make this problem better diagnosable through evergreen and regular BFs and will most certainly come back to this and the proposed patch in the near future.

Comment by Louis Williams [ 25/Feb/22 ]

There is an existing patch for this change available here. We never committed it for the only reason that it didn't actually fix the issue described in SERVER-60335. CC josef.ahmad

Comment by Louis Williams [ 11/Feb/22 ]

Just a thought: we really ough to audit our codebase for usages of condition variables that don't use the waitForConditionOrInterrupt.

Generated at Thu Feb 08 05:58:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.