[SERVER-26179] Do not join the TaskRunner within a runner task in CollectionBulkLoaderImpl::init Created: 20/Sep/16  Updated: 19/Nov/16  Resolved: 27/Sep/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.3.15

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Scott Hernandez (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-25131 CollectionBulkLoaderImpl should relea... Closed
Duplicate
is duplicated by SERVER-26341 TaskRunner destroyed prior to task re... Closed
Related
related to SERVER-26335 Initial sync with buildIndexes=false ... Closed
is related to SERVER-25725 Running {setFeatureCompatibilityVersi... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Apply the following patch to have the cloner attempt to build an index that the IndexCatalog reports already exists since buildIndexes=false.

python buildscripts/resmoke.py --executor=replica_sets jstests/replsets/buildindexes.js

diff --git a/jstests/replsets/buildindexes.js b/jstests/replsets/buildindexes.js
index f6a8a78..16abd10 100644
--- a/jstests/replsets/buildindexes.js
+++ b/jstests/replsets/buildindexes.js
@@ -5,15 +5,22 @@
     var name = "buildIndexes";
     var host = getHostName();
 
-    var replTest = new ReplSetTest({name: name, nodes: 3});
+    var replTest = new ReplSetTest({name: name, nodes: 2});
 
-    var nodes = replTest.startSet();
+    replTest.startSet();
+    replTest.initiate();
+
+    // Create an index before having the secondary start its initial sync to verify that the
+    // 'buildIndexes=false' mode causes index builds to be ignored during the cloning process.
+    assert.commandWorked(replTest.getPrimary().getDB("test").mycoll.createIndex({field: 1}));
+
+    replTest.add({});
 
     var config = replTest.getReplSetConfig();
     config.members[2].priority = 0;
     config.members[2].buildIndexes = false;
-
-    replTest.initiate(config);
+    config.version = 2;
+    assert.commandWorked(replTest.getPrimary().adminCommand({replSetReconfig: config}));
 
     var master = replTest.getPrimary().getDB(name);
     var slaveConns = replTest.liveNodes.slaves;

Sprint: Repl 2016-10-10
Participants:

 Description   

The issue here was that the CollectionBulkLoaderImpl was created in the runner task, and init was called in the task which led to a failure where the loader wasn't returned. The destructor was called, which waited for runner to join, which couldn't happen since this was within the active task in the runner.

Original description

The CollectionBulkLoaderImpl is constructed with an active TaskRunner because it is given the same instance as StorageInterfaceImpl::createCollectionForBulkLoading() created. If CollectionBulkLoaderImpl::init() returns an error status, then CollectionBulkLoaderImpl::commit() is never called, which means the CollectionBulkLoaderImpl never calls TaskRunner::runSynchronousTask() to set TaskRunner::_active back to false.

Thread 3 (Thread 0x7f3dacc7f700 (LWP 28390)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f3dc7f8815c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/mci/toolchain-builder/build-gcc-v2.sh-cAV/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linux/bits/gthr-default.h:864
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007f3dc6d17c6b in wait<(lambda at src/mongo/db/repl/task_runner.cpp:130:25)> (this=0x7f3dccca3938, __lock=..., __p=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/condition_variable:98
#4  mongo::repl::TaskRunner::join (this=0x7f3dccca3900) at src/mongo/db/repl/task_runner.cpp:130
#5  0x00007f3dc6d59498 in mongo::repl::CollectionBulkLoaderImpl::~CollectionBulkLoaderImpl (this=0x7f3dccca3a80) at src/mongo/db/repl/collection_bulk_loader_impl.cpp:81
#6  0x00007f3dc6d59cbe in mongo::repl::CollectionBulkLoaderImpl::~CollectionBulkLoaderImpl (this=0x7f3dccca3a80) at src/mongo/db/repl/collection_bulk_loader_impl.cpp:80
#7  0x00007f3dc6d57af5 in operator() (__ptr=0x7f3dccca393c, this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/bits/unique_ptr.h:76
#8  ~unique_ptr (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/bits/unique_ptr.h:236
#9  operator() (this=<optimized out>, txn=0x7f3dccb3f2c0) at src/mongo/db/repl/storage_interface_impl.cpp:297
#10 std::_Function_handler<mongo::Status (mongo::OperationContext*), mongo::repl::StorageInterfaceImpl::createCollectionForBulkLoading(mongo::NamespaceString const&, mongo::CollectionOptions const&, mongo::BSONObj, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&)::$_0>::_M_invoke(std::_Any_data const&, mongo::OperationContext*&&) (__functor=..., __args=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:1856
#11 0x00007f3dc6d192aa in operator() (this=0x80, __args=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:2271
#12 operator() (this=0x7f3dc95d6bf0, txn=<optimized out>, taskStatus=...) at src/mongo/db/repl/task_runner.cpp:230
#13 std::_Function_handler<mongo::repl::TaskRunner::NextAction (mongo::OperationContext*, mongo::Status const&), mongo::repl::TaskRunner::runSynchronousTask(std::function<mongo::Status (mongo::OperationContext*)>, mongo::repl::TaskRunner::NextAction)::$_4>::_M_invoke(std::_Any_data const&, mongo::OperationContext*&&, mongo::Status const&) (__functor=..., __args=..., __args=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:1856
#14 0x00007f3dc6d189ed in operator() (this=0x7f3dccca393c, __args=..., __args=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:2271
#15 mongo::repl::(anonymous namespace)::runSingleTask(std::function<mongo::repl::TaskRunner::NextAction (mongo::OperationContext*, mongo::Status const&)> const&, mongo::OperationContext*, mongo::Status const&) (task=..., txn=<optimized out>, status=...) at src/mongo/db/repl/task_runner.cpp:66
#16 0x00007f3dc6d182c0 in mongo::repl::TaskRunner::_runTasks (this=0x7f3dccca3900) at src/mongo/db/repl/task_runner.cpp:151
#17 0x00007f3dc611d8ad in operator() (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:2271
#18 mongo::ThreadPool::_doOneTask (this=0x7f3dcd065dc0, lk=0x7f3dacc7e6d0) at src/mongo/util/concurrency/thread_pool.cpp:326
#19 0x00007f3dc611ebcd in mongo::ThreadPool::_consumeTasks (this=0x7f3dcd065dc0) at src/mongo/util/concurrency/thread_pool.cpp:278
#20 0x00007f3dc611e5d5 in mongo::ThreadPool::_workerThreadBody (pool=0x7f3dcd065dc0, threadName=...) at src/mongo/util/concurrency/thread_pool.cpp:228
#21 0x00007f3dc69162d0 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/thread.cc:84
#22 0x00007f3dc302e184 in start_thread (arg=0x7f3dacc7f700) at pthread_create.c:312
#23 0x00007f3dc2d5b37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
...
Thread 9 (Thread 0x7f3daec83700 (LWP 28354)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007f3dc7f8815c in __gthread_cond_wait (__mutex=<optimized out>, __cond=<optimized out>) at /data/mci/toolchain-builder/build-gcc-v2.sh-cAV/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linux/bits/gthr-default.h:864
#2  std::condition_variable::wait (this=<optimized out>, __lock=...) at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007f3dc6d18f7b in wait<(lambda at src/mongo/db/repl/task_runner.cpp:250:31)> (this=0x100000000, __lock=..., __p=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/condition_variable:98
#4  mongo::repl::TaskRunner::runSynchronousTask(std::function<mongo::Status (mongo::OperationContext*)>, mongo::repl::TaskRunner::NextAction) (this=<optimized out>, func=..., nextAction=<optimized out>) at src/mongo/db/repl/task_runner.cpp:250
#5  0x00007f3dc6d54454 in mongo::repl::StorageInterfaceImpl::createCollectionForBulkLoading (this=<optimized out>, nss=..., options=..., idIndexSpec=..., secondaryIndexSpecs=...) at src/mongo/db/repl/storage_interface_impl.cpp:255
#6  0x00007f3dc7073bba in mongo::repl::CollectionCloner::_beginCollectionCallback (this=0x7f3dccb92690, cbd=...) at src/mongo/db/repl/collection_cloner.cpp:341
#7  0x00007f3dc7074bec in operator() (this=<optimized out>, __args=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:2271
#8  operator() (this=<optimized out>, txn=<optimized out>, status=...) at src/mongo/db/repl/collection_cloner.cpp:118
#9  std::_Function_handler<mongo::repl::TaskRunner::NextAction (mongo::OperationContext*, mongo::Status const&), mongo::repl::CollectionCloner::CollectionCloner(mongo::executor::TaskExecutor*, mongo::OldThreadPool*, mongo::HostAndPort const&, mongo::NamespaceString const&, mongo::CollectionOptions const&, std::function<void (mongo::Status const&)> const&, mongo::repl::StorageInterface*)::$_0::operator()(std::function<void (mongo::executor::TaskExecutor::CallbackArgs const&)> const&) const::{lambda(mongo::OperationContext*, mongo::Status const&)#1}>::_M_invoke(std::_Any_data const&, mongo::OperationContext*&&, mongo::Status const&) (__functor=..., __args=..., __args=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:1856
#10 0x00007f3dc6d189ed in operator() (this=0x7f3daec82094, __args=..., __args=...) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:2271
#11 mongo::repl::(anonymous namespace)::runSingleTask(std::function<mongo::repl::TaskRunner::NextAction (mongo::OperationContext*, mongo::Status const&)> const&, mongo::OperationContext*, mongo::Status const&) (task=..., txn=<optimized out>, status=...) at src/mongo/db/repl/task_runner.cpp:66
#12 0x00007f3dc6d182c0 in mongo::repl::TaskRunner::_runTasks (this=0x7f3dccb92c18) at src/mongo/db/repl/task_runner.cpp:151
#13 0x00007f3dc611d8ad in operator() (this=<optimized out>) at /opt/mongodbtoolchain/v2/include/c++/5.3.0/functional:2271
#14 mongo::ThreadPool::_doOneTask (this=0x7f3dcd064000, lk=0x7f3daec826d0) at src/mongo/util/concurrency/thread_pool.cpp:326
#15 0x00007f3dc611ebcd in mongo::ThreadPool::_consumeTasks (this=0x7f3dcd064000) at src/mongo/util/concurrency/thread_pool.cpp:278
#16 0x00007f3dc611e5d5 in mongo::ThreadPool::_workerThreadBody (pool=0x7f3dcd064000, threadName=...) at src/mongo/util/concurrency/thread_pool.cpp:228
#17 0x00007f3dc69162d0 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>) at ../../../../../gcc-5.3.0/libstdc++-v3/src/c++11/thread.cc:84
#18 0x00007f3dc302e184 in start_thread (arg=0x7f3daec83700) at pthread_create.c:312
#19 0x00007f3dc2d5b37d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111


This issue does not affect the version of initial sync in MongoDB 3.2.

python buildscripts/resmoke.py --executor=replica_sets jstests/replsets/buildindexes.js --mongodSetParameters='{use3dot2InitialSync: true, initialSyncOplogBuffer: "inMemoryBlockingQueue"}'



 Comments   
Comment by Benety Goh [ 27/Sep/16 ]

These tests no longer hang but they will still fail. These tests will be re-enabled once the work in SERVER-26335 is completed.

Comment by Max Hirschhorn [ 27/Sep/16 ]

Re-opening so that un-blacklisting the jstests/replsets/buildindexes.js, jstests/replsets/initial_sync3.js, and jstests/replsets/ismaster1.js tests from the replica_sets, replica_sets_auth, replica_sets_compression, replica_sets_legacy, and replica_sets_ese suites can be done as part of this ticket.

https://github.com/mongodb/mongo/commit/96a21e63bfc1a1cdde01c671d0867310c594ea5a#diff-a886677afa540745338de7c942be83dcR6

Comment by Githook User [ 26/Sep/16 ]

Author:

{u'username': u'scotthernandez', u'name': u'Scott Hernandez', u'email': u'scotthernandez@gmail.com'}

Message: SERVER-26179: Have CollectionBulkLoader::init use runner to execute work, not within runner task.
Branch: master
https://github.com/mongodb/mongo/commit/56c9b8d8cc514de6c7a9342b8f47d5e06ead0d68

Comment by Max Hirschhorn [ 20/Sep/16 ]

I'm marking this as a 3.4.0-rc0 blocker because we cannot release a version of 3.4 that hangs during initial sync. The "Steps to reproduce" section of this ticket describes how to reproduce this issue using buildIndexes=false, which is actually how this issue manifested when testing SERVER-25725. However, nothing about this issue is specific to buildIndexes=false - all that needs to happen is for IndexCatalog::_isSpecOk() to return an error status. If we decide to bump the index version again in MongoDB version 3.6, then we'll almost surely need initial sync in MongoDB 3.4 to crash.

Generated at Thu Feb 08 04:11:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.