[SERVER-36782] WT erring because commit timestamp is older than stable timestamp Created: 21/Aug/18  Updated: 29/Oct/23  Resolved: 29/Nov/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.6

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Pavithra Vetriselvan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-35811 Pin stable timestamp behind the oldes... Closed
Related
related to SERVER-36023 Try to enable replica set transaction... Closed
related to SERVER-37943 Enable replica set transactions for t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2018-10-22, Repl 2018-11-05, Repl 2018-11-19, Repl 2018-12-03
Participants:

 Description   

This backtrace is from a failure in prepare_requires_fcv42.js that is reproducible using my patch in SERVER-36023 (activating transactions testing with the inMemory storage engine) – see that ticket for repro specifics. The failure happens before FCV is downgraded, so it's not a version compatibility issue, despite the test. The patch for SERVER-36023 generally produces this error in lots of different tests.

It was suggested that SERVER-35811 should fix this failure, but there are concerns that this isn't an active known failure – i.e. that it should not be failing in master right now. So I'm filing this ticket, so that it can either be confirmed that we known about it, or confirmed we need to fix it more urgently than SERVER-35811.

There were similar test failures with index builds, that were introduced and then fixed during v4.0 dev by SERVER-34172. It is thought that perhaps the prepare transactions path is causing some similar trouble, as it was not used/tested much previously, but this hasn't been explored.

[ReplicaSetFixture:job0:primary] 2018-08-20T14:42:07.208-0400 E STORAGE  [conn13] WiredTiger error (22) [1534790527:208265][1346:0x7fe7391de700], WT_SESSION.timestamp_transaction: __wt_timestamp_validate, 631: commit timestamp 5B7B0B7F00000001 older than stable timestamp 5b7b0b7f00000002: Invalid argument Raw: [1534790527:208265][1346:0x7fe7391de700], WT_SESSION.timestamp_transaction: __wt_timestamp_validate, 631: commit timestamp 5B7B0B7F00000001 older than stable timestamp 5b7b0b7f00000002: Invalid argument
[ReplicaSetFixture:job0:primary] 2018-08-20T14:42:07.208-0400 F -        [conn13] Invariant failure: s->timestamp_transaction(s, conf.c_str()) resulted in status BadValue: 22: Invalid argument at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 256
[ReplicaSetFixture:job0:primary] 2018-08-20T14:42:07.208-0400 F -        [conn13]
[ReplicaSetFixture:job0:primary]
[ReplicaSetFixture:job0:primary] ***aborting after invariant() failure

cleaning up the backtrace to this

 /home/dianna/mongo-copy/src/mongo/util/stacktrace_posix.cpp:172:30: mongo::printStackTrace(std::ostream&)
 /home/dianna/mongo-copy/src/mongo/util/signal_handlers_synchronous.cpp:180:5: mongo::(anonymous namespace)::printSignalAndBacktrace(int)
 /home/dianna/mongo-copy/src/mongo/util/signal_handlers_synchronous.cpp:237:5: mongo::(anonymous namespace)::abruptQuit(int)
 ??:0:0: ??
 /build/glibc-qbmteM/glibc-2.21/signal/../sysdeps/unix/sysv/linux/raise.c:55:0: gsignal
 /build/glibc-qbmteM/glibc-2.21/stdlib/abort.c:89:0: abort
 /home/dianna/mongo-copy/src/mongo/util/assert_util.cpp:126:5: mongo::invariantOKFailed(char const*, mongo::Status const&, char const*, unsigned int)
 /home/dianna/mongo/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:256:13: mongo::WiredTigerRecoveryUnit::_txnClose(bool)
 /home/dianna/mongo/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:89:13: mongo::WiredTigerRecoveryUnit::_commit()
 /home/dianna/mongo/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:165:5: mongo::WiredTigerRecoveryUnit::commitUnitOfWork()
 /home/dianna/mongo-copy/src/mongo/db/storage/write_unit_of_work.cpp:96:33: mongo::WriteUnitOfWork::commit()
 /home/dianna/mongo-copy/src/mongo/db/transaction_participant.cpp:693:34: mongo::TransactionParticipant::_commitTransaction(std::unique_lock<std::mutex>, mongo::OperationContext*)
 /home/dianna/mongo-copy/src/mongo/db/transaction_participant.cpp:681:5: mongo::TransactionParticipant::commitPreparedTransaction(mongo::OperationContext*, mongo::Timestamp)
 /home/dianna/mongo-copy/src/mongo/db/commands/txn_cmds.cpp:101:29: mongo::(anonymous namespace)::CmdCommitTxn::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&)
 /home/dianna/mongo/src/mongo/db/commands.cpp:436:29: mongo::BasicCommand::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*)
 /home/dianna/mongo-copy/src/mongo/db/service_entry_point_common.cpp:459:17: mongo::(anonymous namespace)::invokeInTransaction(mongo::OperationContext*, mongo::CommandInvocation*, mongo::rpc::ReplyBuilderInterface*)
 /home/dianna/mongo-copy/src/mongo/db/service_entry_point_common.cpp:526:13: mongo::(anonymous namespace)::runCommandImpl(mongo::OperationContext*, mongo::CommandInvocation*, mongo::OpMsgRequest const&, mongo::rpc::ReplyBuilderInterface*, mongo::LogicalTime, mongo::ServiceEntryPointCommon::Hooks const&, mongo::BSONObjBuilder*, boost::optional<mongo::OperationSessionInfoFromClient> const&)
 /home/dianna/mongo-copy/src/mongo/db/service_entry_point_common.cpp:873:18: mongo::(anonymous namespace)::execCommandDatabase(mongo::OperationContext*, mongo::Command*, mongo::OpMsgRequest const&, mongo::rpc::ReplyBuilderInterface*, mongo::ServiceEntryPointCommon::Hooks const&)
 /home/dianna/mongo-copy/src/mongo/db/service_entry_point_common.cpp:1021:13: mongo::(anonymous namespace)::receivedCommands(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&)::$_6::operator()() const
 /home/dianna/mongo-copy/src/mongo/db/service_entry_point_common.cpp:969:5: mongo::(anonymous namespace)::receivedCommands(mongo::OperationContext*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&)
 /home/dianna/mongo-copy/src/mongo/db/service_entry_point_common.cpp:1310:22: mongo::ServiceEntryPointCommon::handleRequest(mongo::OperationContext/**
*, mongo::Message const&, mongo::ServiceEntryPointCommon::Hooks const&)
 /home/dianna/mongo/src/mongo/db/service_entry_point_mongod.cpp:123:12: mongo::ServiceEntryPointMongod::handleRequest(mongo::OperationContext*, mongo::Message const&)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:379:35: mongo::ServiceStateMachine::_processMessage(mongo::ServiceStateMachine::ThreadGuard)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:440:17: mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:480:14: mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0::operator()() const
 /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871:2: std::_Function_handler<void (), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0>::_M_invoke(std::_Any_data const&)
 /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267:14: std::function<void ()>::operator()() const
 /home/dianna/mongo-copy/src/mongo/transport/service_executor_synchronous.cpp:115:13: mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:484:48: mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:303:16: mongo::ServiceStateMachine::_sourceCallback(mongo::Status)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:259:9: mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2::operator()(mongo::StatusWith<mongo::Message>) const
 /home/dianna/mongo-copy/src/mongo/util/future.h:112:12: auto mongo::future_details::call<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&, mongo::Message>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&, mongo::Message&&)
 /home/dianna/mongo-copy/src/mongo/util/future.h:818:28: void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Message&&)#1}::operator()(mongo::Message&&) const
 /home/dianna/mongo-copy/src/mongo/util/future.h:1122:20: auto mongo::future_details::Future<mongo::Message>::generalImpl<void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Message&&)#1}, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Status&&)#1}, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda()#1}>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda(mongo::Status&&)#1}&&, void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&::{lambda()#1}&&)
 /home/dianna/mongo-copy/src/mongo/util/future.h:816:16: void mongo::future_details::Future<mongo::Message>::getAsync<mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2>(mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)::$_2&&) &&
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:254:21: mongo::ServiceStateMachine::_sourceMessage(mongo::ServiceStateMachine::ThreadGuard)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:437:17: mongo::ServiceStateMachine::_runNextInGuard(mongo::ServiceStateMachine::ThreadGuard)
 /home/dianna/mongo-copy/src/mongo/transport/service_state_machine.cpp:480:14: mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0::operator()() const
 /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871:2: std::_Function_handler<void (), mongo::ServiceStateMachine::_scheduleNextWithGuard(mongo::ServiceStateMachine::ThreadGuard, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName, mongo::ServiceStateMachine::Ownership)::$_0>::_M_invoke(std::_Any_data const&)
 /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267:14: std::function<void ()>::operator()() const
 /home/dianna/mongo-copy/src/mongo/transport/service_executor_synchronous.cpp:132:13: mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)::$_1::operator()() const
 /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:1871:2: std::_Function_handler<void (), mongo::transport::ServiceExecutorSynchronous::schedule(std::function<void ()>, mongo::transport::ServiceExecutor::ScheduleFlags, mongo::transport::ServiceExecutorTaskName)::$_1>::_M_invoke(std::_Any_data const&)
 /opt/mongodbtoolchain/v2/bin/../lib/gcc/x86_64-mongodb-linux/5.4.0/../../../../include/c++/5.4.0/functional:2267:14: std::function<void ()>::operator()() const
 /home/dianna/mongo-copy/src/mongo/transport/service_entry_point_utils.cpp:55:5: mongo::(anonymous namespace)::runFunc(void*)
 /build/glibc-qbmteM/glibc-2.21/nptl/pthread_create.c:333:0: start_thread
 /build/glibc-qbmteM/glibc-2.21/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109:0: clone
dianna@delphi:~/mongo-copy$ vim jstests/core/txns/prepare_requires_fcv42.js



 Comments   
Comment by Pavithra Vetriselvan [ 29/Nov/18 ]

I've taken a preliminary look at SERVER-37943 and don't see the same symptoms that SERVER-35811 was supposed to fix. Instead of explicit invariant failures, there seems to be a different deadlock in awaitReplication() even though the stable checkpoint advances.

Since we will investigate SERVER-37943 separately, I am closing this ticket because the immediate symptom of the commit timestamp being older than the stable timestamp has been fixed by SERVER-35811.

Comment by Dianna Hohensee (Inactive) [ 06/Nov/18 ]

I have closed SERVER-36023 with the conclusion that replica set transactions cannot be enabled on the inMemory WT storage engine at this time.

I created SERVER-37943 with the outcome, and as a task to actually enable it. I put it on the replication backlog, as the problems lie in the replication layer, and the original agreement was for storage to try to enable it after PM-1119, not fix unknown new problems.

I have not looked into whether the new failures (deadlocks) are related to the old failures (invariants). SERVER-37943 has the new failures.

Comment by Judah Schvimer [ 01/Nov/18 ]

pavithra.vetriselvan, can you please confirm with dianna.hohensee that SERVER-35811 fixed this problem?

Comment by Judah Schvimer [ 21/Aug/18 ]

I'm going to leave this open until SERVER-36023 confirms that there is no problem after SERVER-35811. It is strange that SERVER-35811 hasn't caused problems until in-memory was turned on, however. Confirming that the prepareTimestamp, commitTimestamp, and stableTimestamp all look like what we'd expect could be useful.

Comment by Judah Schvimer [ 21/Aug/18 ]

Given that this is failing on committing a transaction at a timestamp in the past (the prepareTimestamp) I'm fairly confident that SERVER-35811 will fix this. Currently the stable timestamp is able to advance in front of active prepareTimestamps, so when we commit we can commit behind the stable timestamp. This behavior is expected to be fixed in SERVER-35811 by pinning the stable timestamp to the oldest prepareTimestamp.

Generated at Thu Feb 08 04:44:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.