[SERVER-21366] Long-running transactions in MigrateStatus::apply Created: 09/Nov/15  Updated: 06/Dec/22  Resolved: 15/Dec/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.7
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File ShorterScopedXact.patch    
Issue Links:
Related
related to SERVER-21367 Log long-running WT transactions in t... Closed
related to SERVER-21382 Sharding migration transfers all docu... Closed
Assigned Teams:
Sharding
Sprint: Sharding D (12/11/15)
Participants:

 Description   

In a sharded scenario involving chunk moves, inserts, and deletes (via TTL) we observe WT "range of pinned ids" statistic growing, indicating long-running transactions. Added this patch to look for transactions taking >100ms.

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
index 58cdefc..c0a406c 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp
@@ -253,9 +253,15 @@ void WiredTigerRecoveryUnit::_txnClose(bool commit) {
     invariant(_active);
     WT_SESSION* s = _session->getSession();
     if (commit) {
+        if (_timer.millis() >= 100)
+            log() << "XXX commiting; _timer.millis() " << _timer.millis();
+        wassert(_timer.millis() < 100);
         invariantWTOK(s->commit_transaction(s, NULL));
         LOG(2) << "WT commit_transaction";
     } else {
+        if (_timer.millis() >= 100)
+            log() << "XXX rolling back; _timer.millis() " << _timer.millis();
+        wassert(_timer.millis() < 100);
         invariantWTOK(s->rollback_transaction(s, NULL));
         LOG(2) << "WT rollback_transaction";
     }

Here was the longest, a transaction that ran for about 23 s:

2015-11-09T13:58:05.984-0500 I STORAGE  [migrateThread] XXX rolling back; _timer.millis() 23047
2015-11-09T13:58:05.984-0500 I -        [migrateThread] warning assertion failure _timer.millis() < 100 src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 264
2015-11-09T13:58:05.990-0500 I CONTROL  [migrateThread] 
 0xf62172 0xf0b329 0xeeef6b 0xd87f6b 0xdf3e64 0xe03619 0xe04fdc 0xdf0c79 0xdf1f0d 0xfb154f 0x7f212d8e7182 0x7f212c9e847d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B62172"},{"b":"400000","o":"B0B329"},{"b":"400000","o":"AEEF6B"},{"b":"400000","o":"987F6B"},{"b":"400000","o":"9F3E64"},{"b":"400000","o":"A03619"},{"b":"400000","o":"A04FDC"},{"b":"400000","o":"9F0C79"},{"b":"400000","o":"9F1F0D"},{"b":"400000","o":"BB154F"},{"b":"7F212D8DF000","o":"8182"},{"b":"7F212C8EE000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.8-pre-", "gitVersion" : "d4f8d4b2a576160348de61f972fa478a845234fd", "uname" : { "sysname" : "Linux", "release" : "3.13.0-32-generic", "version" : "#57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "83C5D7336ECF9A76049E8DE24FC106DEF6241FF9" }, { "b" : "7FFFA00FE000", "elfType" : 3, "buildId" : "E464DBB7341B7B9E7874DC0619C5F429416E6AC6" }, { "b" : "7F212D8DF000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F212D6D7000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F212D4D3000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F212D1CF000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7F212CEC9000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F212CCB3000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "CC0D578C2E0D86237CA7B0CE8913261C506A629A" }, { "b" : "7F212C8EE000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F212DAFD000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0xf62172]
 mongod(_ZN5mongo10logContextEPKc+0xE9) [0xf0b329]
 mongod(_ZN5mongo9wassertedEPKcS1_j+0x19B) [0xeeef6b]
 mongod(_ZN5mongo22WiredTigerRecoveryUnit9_txnCloseEb+0x3DB) [0xd87f6b]
 mongod(_ZN5mongo13MigrateStatus5applyEPNS_16OperationContextERKSsNS_7BSONObjES5_S5_RKS5_Py+0x9B4) [0xdf3e64]
 mongod(_ZN5mongo13MigrateStatus3_goEPNS_16OperationContextERKSsNS_7BSONObjES5_S5_S4_RKNS_3OIDERKNS_19WriteConcernOptionsE+0x2309) [0xe03619]
 mongod(_ZN5mongo13MigrateStatus2goEPNS_16OperationContextERKSsNS_7BSONObjES5_S5_S4_RKNS_3OIDERKNS_19WriteConcernOptionsE+0xCC) [0xe04fdc]
 mongod(_ZN5mongo13migrateThreadESsNS_7BSONObjES0_S0_SsNS_3OIDENS_19WriteConcernOptionsE+0x109) [0xdf0c79]
 mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvPFvSsN5mongo7BSONObjES5_S5_SsNS4_3OIDENS4_19WriteConcernOptionsEENS2_5list7INS2_5valueISsEENSB_IS5_EESD_SD_SC_NSB_IS6_EENSB_IS7_EEEEEEE3runEv+0x11D) [0xdf1f0d]
 mongod(+0xBB154F) [0xfb154f]
 libpthread.so.0(+0x8182) [0x7f212d8e7182]
 libc.so.6(clone+0x6D) [0x7f212c9e847d]
-----  END BACKTRACE  -----

addr2line:

mongo::printStackTrace(std::ostream&) at src/mongo/util/stacktrace_posix.cpp:105
mongo::logContext(char const*) at src/mongo/util/log.cpp:134 (discriminator 1)
mongo::wasserted(char const*, char const*, unsigned int) at src/mongo/util/assert_util.cpp:119
mongo::WiredTigerRecoveryUnit::_txnClose(bool) at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:264 (discriminator 1)
mongo::MigrateStatus::apply(mongo::OperationContext*, std::string const&, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj const&, unsigned long long*) at src/mongo/s/d_migrate.cpp:2411
~intrusive_ptr at src/third_party/boost/boost/smart_ptr/intrusive_ptr.hpp:96
 (inlined by) ~SharedBuffer at src/mongo/util/shared_buffer.h:38
 (inlined by) ~BSONObj at src/mongo/bson/bsonobj.h:90
 (inlined by) mongo::MigrateStatus::_go(mongo::OperationContext*, std::string const&, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, std::string const&, mongo::OID const&, mongo::WriteConcernOptions const&) at src/mongo/s/d_migrate.cpp:2209
~intrusive_ptr at src/third_party/boost/boost/smart_ptr/intrusive_ptr.hpp:96
 (inlined by) ~SharedBuffer at src/mongo/util/shared_buffer.h:38
 (inlined by) ~BSONObj at src/mongo/bson/bsonobj.h:90
 (inlined by) mongo::MigrateStatus::go(mongo::OperationContext*, std::string const&, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, std::string const&, mongo::OID const&, mongo::WriteConcernOptions const&) at src/mongo/s/d_migrate.cpp:1870
~intrusive_ptr at src/third_party/boost/boost/smart_ptr/intrusive_ptr.hpp:96
 (inlined by) ~SharedBuffer at src/mongo/util/shared_buffer.h:38
 (inlined by) ~BSONObj at src/mongo/bson/bsonobj.h:90
 (inlined by) mongo::migrateThread(std::string, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, std::string, mongo::OID, mongo::WriteConcernOptions) at src/mongo/s/d_migrate.cpp:2635
~basic_string at /usr/include/c++/4.8/bits/basic_string.h:539
 (inlined by) operator()<void (*)(std::basic_string<char>, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, std::basic_string<char>, mongo::OID, mongo::WriteConcernOptions), boost::_bi::list0> at src/third_party/boost/boost/bind/bind.hpp:670
 (inlined by) boost::_bi::bind_t<void, void (*)(std::string, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, std::string, mongo::OID, mongo::WriteConcernOptions), boost::_bi::list7<boost::_bi::value<std::string>, boost::_bi::value<mongo::BSONObj>, boost::_bi::value<mongo::BSONObj>, boost::_bi::value<mongo::BSONObj>, boost::_bi::value<std::string>, boost::_bi::value<mongo::OID>, boost::_bi::value<mongo::WriteConcernOptions> > >::operator()() at src/third_party/boost/boost/bind/bind_template.hpp:20
 (inlined by) boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(std::string, mongo::BSONObj, mongo::BSONObj, mongo::BSONObj, std::string, mongo::OID, mongo::WriteConcernOptions), boost::_bi::list7<boost::_bi::value<std::string>, boost::_bi::value<mongo::BSONObj>, boost::_bi::value<mongo::BSONObj>, boost::_bi::value<mongo::BSONObj>, boost::_bi::value<std::string>, boost::_bi::value<mongo::OID>, boost::_bi::value<mongo::WriteConcernOptions> > > >::run() at src/third_party/boost/boost/thread/detail/thread.hpp:62
thread_proxy at src/third_party/boost/libs/thread/src/pthread/thread.cpp:133

This transaction was held open for the duration of applying deletes during the catchup phase of a chunk migration.



 Comments   
Comment by Kaloian Manassiev [ 15/Dec/17 ]

Starting with version 3.2, this code no longer exists and I believe all the places performing long-running WT transactions without yielding the snapshot have been cleaned up.

Since 3.0 will soon no longer be supported I am closing this as Won't Fix.

Comment by Daniel Pasette (Inactive) [ 23/Dec/15 ]

Re-opening while we decide whether to go with this fix.

Comment by Githook User [ 18/Dec/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}

Message: Revert "SERVER-21366 Periodically yield when applying migration deletions"

This reverts commit 934c5a5241edd01df270065831646d78ed5a80c1.
Branch: v3.0
https://github.com/mongodb/mongo/commit/19864bb4962d584c84bae1f705b5abe48c9685e3

Comment by Githook User [ 18/Dec/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}

Message: Revert "SERVER-21366 Periodically yield when applying migration deletions"

This reverts commit 7f3d0f2cfac80f49d8c7d8ec6aeaad7cae6d6cb0.
Branch: master
https://github.com/mongodb/mongo/commit/a3d8fbcadfeae8418ba17e5e90e51b929ff3ff93

Comment by Githook User [ 18/Dec/15 ]

Author:

{u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}

Message: Revert "SERVER-21366 Periodically yield when applying migration deletions"

This reverts commit b306a90872fcf190462daaad1c3154d48c324ca9.
Branch: v3.2
https://github.com/mongodb/mongo/commit/39879fb9b856c170fc6884bc365a3be2632ef86b

Comment by Githook User [ 15/Dec/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21366 Periodically yield when applying migration deletions
Branch: v3.0
https://github.com/mongodb/mongo/commit/934c5a5241edd01df270065831646d78ed5a80c1

Comment by Githook User [ 08/Dec/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21366 Periodically yield when applying migration deletions
Branch: v3.2
https://github.com/mongodb/mongo/commit/b306a90872fcf190462daaad1c3154d48c324ca9

Comment by Githook User [ 08/Dec/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-21366 Periodically yield when applying migration deletions
Branch: master
https://github.com/mongodb/mongo/commit/7f3d0f2cfac80f49d8c7d8ec6aeaad7cae6d6cb0

Comment by Bruce Lucas (Inactive) [ 10/Nov/15 ]

Testing verifies that this patch eliminates the long-running transactions. However a problem remains: a chunk move can still get stuck, apparently indefinitely.

Following experiment sheds some light:

  • eliminate the TTL from the test and the stuck moveChunks don't reproduce when there are only inserts
  • now do a large batch of removes e.g. db.c.remove({}) on the donor shard
  • the chunk move in progress now gets stuck for the duration of the removes and some time after, even though only a tiny fraction of those removes impacted the chunk in flight

I think this comment explains why: we can't determine whether deletes relate to the chunk in flight, so we transfer all deletes from the donor shard to the recipient shard. So as long as there is a steady stream of deletes, whether they relate to the chunk in flight or not, the migration will not complete. The migration however can complete if the deletes stop (for example, the 60 second interval between TTL passes), and if the recipient shard is able to process the incoming delete ops quickly and determine that they are not present on the recipient shard (because they relate to chunks on the donor shard, and not to chunks on this shard, nor the chunk in flight).

Comment by Bruce Lucas (Inactive) [ 09/Nov/15 ]

Thanks for the quick response Kal. Patch applied and running. I'll get back to you in the morning with the results.

One question though: does this have the same effect as your initial suggestion of committing the transaction after every document? Would it make sense to do it every n documents (e.g. 100) instead? If there is a high rate of deletes I think this portion of the moveChunk may struggle to keep up, and in the TTL case in fact it may not actually doing anything for each document because they may have been deleted already on the to-shard by the TTL monitor, so anything we could do to keep that loop fast should help. Same question about reacquiring the lock for each document.

Comment by Kaloian Manassiev [ 09/Nov/15 ]

bruce.lucas@mongodb.com, would it be possible to apply the attached patch to 3.0 and run it with your repro?

Comment by Randolph Tan [ 09/Nov/15 ]

kaloian.manassiev Talked with jason.rassi about this. Deleting over a single document over _id should be fast (unless there's no _id index) and no yielding makes the logic easier to understand. In addition, the loop is already re-acquiring the collection lock for every iteration, so it's basically yielding already.

Comment by Kaloian Manassiev [ 09/Nov/15 ]

renctan, do you know why do we need to do manual yielding for deleteObjects when those documents should be logically independent?

bruce.lucas, in your repro would it be possible to add a call to txn->recoveryUnit()->commitAndRestart() at the end of each iteration of the loop and see if this helps. This should keep the pinned transaction range smaller.

Generated at Thu Feb 08 03:57:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.