[SERVER-46826] Instantiate the JournalFlusher thread for ephemeral engines and when non-durable (nojournal=true) Created: 12/Mar/20  Updated: 29/Oct/23  Resolved: 25/Aug/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.5

Type: Improvement Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-48650 Unit tests' ServiceContext's NetworkI... Closed
depends on SERVER-45847 Pull the JournalFlusher out of the st... Closed
is depended on by SERVER-48149 Move callers of waitUntilDurable onto... Closed
Problem/Incident
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Execution Team 2020-03-23, Execution Team 2020-04-06, Execution Team 2020-04-20, Execution Team 2020-05-04, Execution Team 2020-05-18, Execution Team 2020-06-01, Execution Team 2020-06-15, Execution Team 2020-06-29, Execution Team 2020-09-07
Participants:
Linked BF Score: 38

 Description   

Always create the JournalFlusher, regardless of ephemeral or non-durable storage engine settings, because using it to wait for write concern should be a performance gain for all settings and also simplifies the logic.

Today we skip creating the JournalFlusher thread for both ephemeral or non-durable (nojournal=true) storage engine settings.
https://github.com/mongodb/mongo/blob/db3a17bbfe2e265722ed88df961e79f3e1a68067/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L989-L993

Originally, the JournalFlusher was purely a periodically running task. However, recently, that changed. The JournalFlusher is now a periodic task that can also be pinged for an immediate run.
(Old in v4.2)
https://github.com/mongodb/mongo/blob/e329dd322df4a226b143031c99b5f943d3a9be4a/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L251
(New in v4.4)
https://github.com/mongodb/mongo/blob/db3a17bbfe2e265722ed88df961e79f3e1a68067/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L305-L307

Moving the write_concern.cpp waitUntilDurable calls onto the JournalFlusher thread via waitForJournalFlush has been a big performance win. Currently, we use logic in waitForJournalFlush to skip the JournalFlusher thread when it doesn’t exist and instead call waitUntilDurable directly.
https://github.com/mongodb/mongo/blob/db3a17bbfe2e265722ed88df961e79f3e1a68067/src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp#L2377-L2383

We would want to override the periodical functionality of the JournalFlusher thread when ephemeral or !durable, I think, and make it purely run when requested. Taking checkpoints so frequently for non-durable (nojournal) might slow down the system; and ephemeral only updates the JournalListener (durable timestamp).



 Comments   
Comment by Githook User [ 23/Feb/21 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-46826 Instantiate the JournalFlusher thread for inMemory and non-durable (nojournal=true) storage engines.

Non-durable engines will not run the JournalFlusher logic periodically, but only upon request,
because checkpoints are costly. inMemory will run the JournalFlusher periodically because it
updates the repl timestamps faster.

(cherry picked from commit 4609f3ebfb178f37153bc04678176af722b0d304)
Branch: v4.4
https://github.com/mongodb/mongo/commit/314c8600e876562fc7b913611d05eaed10da1610

Comment by Githook User [ 25/Aug/20 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-46826 Instantiate the JournalFlusher thread for inMemory and non-durable (nojournal=true) storage engines.

Non-durable engines will not run the JournalFlusher logic periodically, but only upon request,
because checkpoints are costly. inMemory will run the JournalFlusher periodically because it
updates the repl timestamps faster.
Branch: master
https://github.com/mongodb/mongo/commit/4609f3ebfb178f37153bc04678176af722b0d304

Comment by Dianna Hohensee (Inactive) [ 24/Aug/20 ]

Code review url: https://mongodbcr.appspot.com/662400001/

Comment by Dianna Hohensee (Inactive) [ 08/Jun/20 ]

Well, that idea completely doesn't work. Per this code, the NetworkInterfaceMock unique_ptr ownership is given to the ThreadPoolTaskExecutor, whose ownership is in turn given to the ReplicationCoordinatorImpl instance. Given this kind of ownership, there's no way to extend the lifetime of the NetworkInterfaceMock past the ReplCoordTest lifetime...

Comment by Dianna Hohensee (Inactive) [ 08/Jun/20 ]

Ran into an interesting problem:

[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.003+0000 | 2020-06-05T20:47:57.003Z I  STORAGE  22320   [main] "Shutting down journal flusher thread"
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 src/mongo/executor/network_interface_mock.h:567:22: runtime error: member call on address 0x61600038ee80 which does not point to an object of type 'mongo::executor::NetworkInterfaceMock'
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 0x61600038ee80: note: object has a possibly invalid vptr: abs(offset to top) too big
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000  a4 00 00 48  8b 00 00 49 4d 56 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000               ^~~~~~~~~~~~~~~~~~~~~~~
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000               possibly invalid vptr
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.642+0000     #0 0x564d4e99e90a in mongo::executor::NetworkInterfaceMockClockSource::now() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/executor/network_interface_mock.h:567:22
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.652+0000     #1 0x564d512227b1 in mongo::DiagnosticInfo::capture(mongo::StringData const&, mongo::DiagnosticInfo::Options) /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/diagnostic_info.cpp:285:73
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.652+0000     #2 0x564d512250a9 in mongo::(anonymous namespace)::_mongoInitializerFunction_DiagnosticInfo(mongo::InitializerContext*)::DiagnosticListener::onContendedLock(mongo::latch_detail::Identity const&) /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/diagnostic_info.cpp:192:43
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.653+0000     #3 0x564d532adfe6 in mongo::latch_detail::Mutex::_onContendedLock() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/platform/mutex.cpp:89:19
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.653+0000     #4 0x564d532ad29e in mongo::latch_detail::Mutex::lock() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/platform/mutex.cpp:55:5
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.678+0000     #5 0x564d4e36c544 in std::unique_lock<mongo::latch_detail::Latch>::lock() /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_mutex.h:267:17
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.678+0000     #6 0x564d4e36beba in std::unique_lock<mongo::latch_detail::Latch>::unique_lock(mongo::latch_detail::Latch&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_mutex.h:197:2
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.682+0000     #7 0x564d4fe2d1de in mongo::JournalFlusher::run() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/db/storage/control/journal_flusher.cpp:127:34
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #8 0x564d52b09f7d in mongo::BackgroundJob::jobBody() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/background.cpp:160:5
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #9 0x564d52b12375 in std::__invoke_result<mongo::BackgroundJob::go()::$_0>::type std::__invoke<mongo::BackgroundJob::go()::$_0>(mongo::BackgroundJob::go()::$_0&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/invoke.h:95:14
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #10 0x564d52b12375 in decltype(auto) std::__apply_impl<mongo::BackgroundJob::go()::$_0, std::tuple<> >(mongo::BackgroundJob::go()::$_0&&, std::tuple<>&&, std::integer_sequence<unsigned long>) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/tuple:1678
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #11 0x564d52b12375 in decltype(auto) std::apply<mongo::BackgroundJob::go()::$_0, std::tuple<> >(mongo::BackgroundJob::go()::$_0&&, std::tuple<>&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/tuple:1687
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #12 0x564d52b12375 in mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()::operator()() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/stdx/thread.h:186
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #13 0x564d52b12375 in mongo::BackgroundJob::go()::$_0 std::__invoke_impl<void, mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()>(std::__invoke_other, mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/invoke.h:60
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.686+0000     #14 0x564d535a52ee in execute_native_thread_routine /data/mci/ff40e18d703ff1f6e8f8ecf75ad7202f/toolchain-builder/tmp/build-gcc-v3.sh-NJq/build/x86_64-mongodb-linux/libstdc++-v3/src/c++11/../../../../../src/combined/libstdc++-v3/src/c++11/thread.cc:80:18
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.686+0000     #15 0x7f8e0e26b6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000     #16 0x7f8e0dd7c88e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 
[cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/mongo/executor/network_interface_mock.h:567:22 in

It seems like I'd have to set up a NetworkInterfaceMock in unit tests running the JournalFlusher. Then higher level test fixtures can use that NetworkInterfaceMock, and the NetworkInterfaceMock will remain valid when the JournalFlusher needs to access it through the clock source.

Generated at Thu Feb 08 05:12:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.