[SERVER-21583] ApplyOps background index creation may deadlock Created: 20/Nov/15  Updated: 03/Jun/22  Resolved: 07/Jan/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.0.9, 3.2.3, 3.3.0

Type: Bug Priority: Critical - P2
Reporter: Christian Mueller Assignee: Benety Goh
Resolution: Done Votes: 1
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log     File oplog.bson    
Issue Links:
Related
related to SERVER-22109 Invariant failure when running applyO... Closed
related to SERVER-27834 Index builds during initial sync shou... Closed
Backwards Compatibility: Fully Compatible
Operating System: Linux
Backport Completed:
Steps To Reproduce:
  • new replicaset without any data
  • standalone mongod on port 30005 (just for demonstration)
  • on replicaset primary:

     
    db.foobar.createIndex({"a": 1 }, {background: true})
    db.foobar.insert({"a":NumberLong("1444173423471"),"_id":"1444173420000-3"})
    

  • mongodump of oplog

    mongodump -d local -c oplog.rs --port 30001 -o dumptest/
    

  • reformat dump:

    mv dumptest/local/oplog.rs.bson dumptest/oplog.bson; rm -rf dumptest/local
    

  • mongorestore to replay oplog:

    mongorestore -v  --port 30005 --oplogReplay dumptest/
    

Sprint: Repl E (01/08/16)
Participants:

 Description   

I tried to replay the oplog on one server to recover however the mongorestore did kept running indefinitely, on the server many op are running trying to wait for a lock, and when trying to stop the server, it's not responsive, so I have to run kill-9 against it.



 Comments   
Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 added applyOps test case for creating foreground and background indexes

(cherry picked from commit 4ed0efa3368a89cb808c62b2d9e916acf8202ff7)
Branch: v3.2
https://github.com/mongodb/mongo/commit/89b08ef60181d725d285c5dfb32197cafd47e361

Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 applyOperation_inlock builds background indexes in foreground if temp release fails

(cherry picked from commit edc5989d36f6dea86ffe034c414e97e98549004e)
Branch: v3.2
https://github.com/mongodb/mongo/commit/229c741beedd812afe17e1be3e18958a88b40203

Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 fixed background index options in background index replsets test

(cherry picked from commit 54dfb6df908be81c50587a854964551ad72110cc)
Branch: v3.2
https://github.com/mongodb/mongo/commit/2f08ba3bcf0b4aabf6609ed5e6fff36cfb227605

Comment by Kamran K. [ 08/Jan/16 ]

Good catch. I filed that issue separately as SERVER-22109.

Comment by Benety Goh [ 08/Jan/16 ]

Seems to be an existing validation issue in 3.2 and 3.0. You can trigger the invariant in 3.2.0 and 3.0.8 by running the same command without background: true:

db.adminCommand({
    applyOps: [{
            'op': 'i',
            'ns': db.getName() + '.system.indexes',
            'o': {
                ns: 'u',
                key: { b: 1 },
                name: 'b_1'
            }
        }]
});

We should probably fix this.

Comment by Kamran K. [ 08/Jan/16 ]

benety.goh, this fix seems to have introduced an invariant failure with WiredTiger:

Invariant failure _name == nsToDatabaseSubstring(ns) src/mongo/db/catalog/database.cpp 429

Repro against a standalone:

db.adminCommand({
    applyOps: [{
            'op': 'i',
            'ns': db.getName() + '.system.indexes',
            'o': {
                ns: 'u',
                key: { b: 1 },
                name: 'b_1',
                background: true
            }
        }]
});

Backtrace:

* thread #14: tid = 0xd45ac, 0x00007fff9a363002 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGABRT
  * frame #0: 0x00007fff9a363002 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff907f35c5 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fff9370d787 libsystem_c.dylib`__abort + 145
    frame #3: 0x00007fff9370d6f6 libsystem_c.dylib`abort + 144
    frame #4: 0x00000001014b7073 mongod`mongo::invariantFailed(expr="_name == nsToDatabaseSubstring(ns)", file="src/mongo/db/catalog/database.cpp", line=429) + 387 at assert_util.cpp:153
    frame #5: 0x000000010031a29d mongod`mongo::Database::getCollection(this=0x000000010560cef0, ns=(_data = "u", _size = 1)) const + 141 at database.cpp:429
    frame #6: 0x0000000100751803 mongod`mongo::IndexBuilder::_build(this=0x000070000069e1a8, txn=0x0000000107017480, db=0x000000010560cef0, allowBackgroundBuilding=false, dbLock=0x0000000000000000) const + 323 at index_builder.cpp:128
    frame #7: 0x0000000100752812 mongod`mongo::IndexBuilder::buildInForeground(this=0x000070000069e1a8, txn=0x0000000107017480, db=0x000000010560cef0) const + 50 at index_builder.cpp:110
    frame #8: 0x0000000100bfb656 mongod`mongo::repl::applyOperation_inlock(txn=0x0000000107017480, db=0x000000010560cef0, op=0x00007000006a1418, convertUpdateToUpsert=true, incrementOpsAppliedStats=mongo::repl::IncrementOpsAppliedStatsFn @ 0x00007000006a1900)>) + 2822 at oplog.cpp:743
    frame #9: 0x00000001002d677a mongod`mongo::applyOps(txn=0x0000000107017480, dbName="admin", applyOpCmd=0x00007000006a1eb0, result=0x00007000006a2040) + 4394 at apply_ops.cpp:141
    frame #10: 0x00000001003712b2 mongod`mongo::ApplyOpsCmd::run(this=0x00000001029a5e00, txn=0x0000000107017480, dbname="admin", cmdObj=0x00007000006a1eb0, (null)=0, errmsg="", result=0x00007000006a2040) + 658 at apply_ops.cpp:112
    frame #11: 0x00000001004ce36a mongod`mongo::Command::run(this=0x00000001029a5e00, txn=0x0000000107017480, request=0x00007000006a2ed0, replyBuilder=0x00007000006a3080) + 3690 at dbcommands.cpp:1441
    frame #12: 0x00000001004ccfd0 mongod`mongo::Command::execCommand(txn=0x0000000107017480, command=0x00000001029a5e00, request=0x00007000006a2ed0, replyBuilder=0x00007000006a3080) + 5264 at dbcommands.cpp:1326
    frame #13: 0x000000010036ece6 mongod`mongo::runCommands(txn=0x0000000107017480, request=0x00007000006a2ed0, replyBuilder=0x00007000006a3080) + 2118 at commands.cpp:498
    frame #14: 0x00000001007653d4 mongod`mongo::(anonymous namespace)::receivedRpc(txn=0x0000000107017480, client=0x0000000105702ca0, dbResponse=0x00007000006a4430, message=0x00007000006a4b90) + 628 at instance.cpp:304
    frame #15: 0x000000010076237d mongod`mongo::assembleResponse(txn=0x0000000107017480, m=0x00007000006a4b90, dbresponse=0x00007000006a4430, remote=0x00007000006a41b0) + 2301 at instance.cpp:525
    frame #16: 0x000000010000b88d mongod`mongo::MyMessageHandler::process(this=0x00000001070123a0, m=0x00007000006a4b90, port=0x000000010560cb70) + 397 at db.cpp:171
    frame #17: 0x000000010150130d mongod`mongo::PortMessageServer::handleIncomingMsg(arg=0x000000010560cb70) + 1917 at message_server_port.cpp:229
    frame #18: 0x00000001014ffc7a mongod`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(void*) [inlined] decltype(__f=0x000000010560bfe0, __args=0x000000010560bfe8)(void*)>(fp)(std::__1::forward<mongo::(anonymous namespace)::MessagingPortWithHandler*&>(fp0))) std::__1::__invoke<void* (*&)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*&>(void* (*&&&)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*&&&) + 24 at __functional_base:415
    frame #19: 0x00000001014ffc62 mongod`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(void*) [inlined] std::__1::__bind_return<void* (*)(void*), std::__1::tuple<mongo::(anonymous namespace)::MessagingPortWithHandler*>, std::__1::tuple<>, _is_valid_bind_return<void* (*)(void*), std::__1::tuple<mongo::(anonymous namespace)::MessagingPortWithHandler*>, std::__1::tuple<> >::value>::type std::__1::__apply_functor<void* (__f=0x000000010560bfe0, __bound_args=0x000000010560bfe8, (null)=__tuple_indices<0> @ 0x00007000006a4ea0, __args=0x00007000006a4e60)(void*), std::__1::tuple<mongo::(anonymous namespace)::MessagingPortWithHandler*>, 0ul, std::__1::tuple<> >(void* (*&)(void*), std::__1::tuple<mongo::(anonymous namespace)::MessagingPortWithHandler*>&, std::__1::__tuple_indices<0ul>, std::__1::tuple<>&&) + 40 at functional:2060
    frame #20: 0x00000001014ffc3a mongod`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(void*) [inlined] std::__1::__bind_return<void* (*)(void*), std::__1::tuple<mongo::(anonymous namespace)::MessagingPortWithHandler*>, std::__1::tuple<>, _is_valid_bind_return<void* (*)(void*), std::__1::tuple<mongo::(anonymous namespace)::MessagingPortWithHandler*>, std::__1::tuple<> >::value>::type std::__1::__bind<void* (this=0x000000010560bfe0)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*>::operator()<>() + 38 at functional:2123
    frame #21: 0x00000001014ffc14 mongod`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(void*) [inlined] decltype(__f=0x000000010560bfe0)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> >(fp)(std::__1::forward<>(fp0))) std::__1::__invoke<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> >(std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*>&&) + 11 at __functional_base:415
    frame #22: 0x00000001014ffc09 mongod`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(void*) [inlined] void std::__1::__thread_execute<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> >(__t=0x000000010560bfe0, (null)=__tuple_indices<> @ 0x00007000006a4e38)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> >&, std::__1::__tuple_indices<>) + 25 at thread:337
    frame #23: 0x00000001014ffbf0 mongod`void* std::__1::__thread_proxy<std::__1::tuple<std::__1::__bind<void* (*)(void*), mongo::(anonymous namespace)::MessagingPortWithHandler*> > >(__vp=0x000000010560bfe0) + 368 at thread:347
    frame #24: 0x00007fff907f1c13 libsystem_pthread.dylib`_pthread_body + 131
    frame #25: 0x00007fff907f1b90 libsystem_pthread.dylib`_pthread_start + 168
    frame #26: 0x00007fff907ef375 libsystem_pthread.dylib`thread_start + 13

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 added applyOps test case for creating foreground and background indexes

(cherry picked from commit 4ed0efa3368a89cb808c62b2d9e916acf8202ff7)
Branch: v3.0
https://github.com/mongodb/mongo/commit/e47291432ae065ff8103e8e598fa6402e4352245

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 applyOperation_inlock builds background indexes in foreground if temp release fails

(cherry picked from commit edc5989d36f6dea86ffe034c414e97e98549004e)
Branch: v3.0
https://github.com/mongodb/mongo/commit/95933f3bc721ccbcfda98e12ef85f21801add565

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 fixed background index options in background index replsets test

(cherry picked from commit 54dfb6df908be81c50587a854964551ad72110cc)
Branch: v3.0
https://github.com/mongodb/mongo/commit/0636cc5de3485479edc98ce383ff8b512bfe118d

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 added applyOps test case for creating foreground and background indexes
Branch: master
https://github.com/mongodb/mongo/commit/4ed0efa3368a89cb808c62b2d9e916acf8202ff7

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 applyOperation_inlock builds background indexes in foreground if temp release fails
Branch: master
https://github.com/mongodb/mongo/commit/edc5989d36f6dea86ffe034c414e97e98549004e

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}

Message: SERVER-21583 fixed background index options in background index replsets test
Branch: master
https://github.com/mongodb/mongo/commit/54dfb6df908be81c50587a854964551ad72110cc

Comment by Daniel Pasette (Inactive) [ 15/Dec/15 ]

flozano, We plan to backport the fix to 3.0.9 as well.

Comment by Francisco Alejandro Lozano López [ 14/Dec/15 ]

I see target fix is 3.2.1... will 3.0 get the fix also?

Comment by Christian Mueller [ 14/Dec/15 ]

I was just wondering about the status of this issue and if you were already able to resolve it?
Thank you

Comment by Scott Hernandez (Inactive) [ 24/Nov/15 ]

Due to the fact that temp-release is used in background index building and applyOps disables the ability to use it, we get into a situation where we stall.

Possible solutions:

  • If applyOps.size == 1 and no pre/post condition, don't disable temp-release use
  • Plumb though a flag to make the index build foreground, so temp-release isn't used.
Comment by Christian Mueller [ 24/Nov/15 ]

thank you for prompt checking.
I also tried running mongorestore with --noIndexRestore option, but unfortunately the behavior didn't change and mongod process still hanged.

Comment by Kaloian Manassiev [ 20/Nov/15 ]

This looks like a deadlock between the oplog applier thread #4, which obtains global X lock and disables temp releases and thread # 3, which waits to get lock on the collection in order to start. This seems like a pretty basic sequence. scotthernandez, has something changed recently, which would have broken this scenario?

Thread 4 (Thread 0x7f5e4dd35700 (LWP 15870)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000000a9733b in wait (this=0x1b69e60 <mongo::(anonymous namespace)::_bgIndexStartingCondVar>, m=...)
    at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:56
#2  mongo::IndexBuilder::waitForBgIndexStarting () at src/mongo/db/index_builder.cpp:114
#3  0x0000000000c33fa7 in mongo::repl::applyOperation_inlock (txn=0x7f5e4dd34790, db=0x28583c0, op=..., 
    fromRepl=<optimized out>, convertUpdateToUpsert=<optimized out>) at src/mongo/db/repl/oplog.cpp:575
#4  0x0000000000924ba8 in mongo::ApplyOpsCmd::run(mongo::OperationContext*, std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) ()
#5  0x00000000009be494 in mongo::_execCommand (txn=txn@entry=0x7f5e4dd34790, c=c@entry=0x1b63aa0 <mongo::applyOpsCmd>, 
    dbname=..., cmdObj=..., queryOptions=queryOptions@entry=0, errmsg=..., result=..., fromRepl=fromRepl@entry=false)
    at src/mongo/db/dbcommands.cpp:1361
#6  0x00000000009bf41d in mongo::Command::execCommand (txn=txn@entry=0x7f5e4dd34790, 
    c=c@entry=0x1b63aa0 <mongo::applyOpsCmd>, queryOptions=queryOptions@entry=0, cmdns=cmdns@entry=0x2738014 "admin.$cmd", 
    cmdObj=..., result=..., fromRepl=fromRepl@entry=false) at src/mongo/db/dbcommands.cpp:1564
#7  0x00000000009c012b in mongo::_runCommands (txn=txn@entry=0x7f5e4dd34790, ns=0x2738014 "admin.$cmd", _cmdobj=..., b=..., 
    anObjBuilder=..., fromRepl=fromRepl@entry=false, queryOptions=0) at src/mongo/db/dbcommands.cpp:1636
#8  0x0000000000b9416a in runCommands (fromRepl=false, queryOptions=<optimized out>, anObjBuilder=..., b=..., curop=..., 
    jsobj=..., ns=<optimized out>, txn=0x7f5e4dd34790) at src/mongo/db/query/find.cpp:139
#9  mongo::runQuery (txn=txn@entry=0x7f5e4dd34790, m=..., q=..., nss=..., curop=..., result=...)
    at src/mongo/db/query/find.cpp:588
#10 0x0000000000aa4160 in receivedQuery (m=..., dbresponse=..., c=..., txn=0x7f5e4dd34790) at src/mongo/db/instance.cpp:210
#11 mongo::assembleResponse (txn=txn@entry=0x7f5e4dd34790, m=..., dbresponse=..., remote=...)
    at src/mongo/db/instance.cpp:385
#12 0x00000000007e9eed in mongo::MyMessageHandler::process (this=<optimized out>, m=..., port=0x25335c0, le=0x25323f0)
    at src/mongo/db/db.cpp:205
#13 0x0000000000f1be6b in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#14 0x00007f5e4d929182 in start_thread (arg=0x7f5e4dd35700) at pthread_create.c:312
#15 0x00007f5e4ca2a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 3 (Thread 0x7f5e4465e700 (LWP 15872)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00000000009a4e0a in timed_wait (wait_until=..., m=..., this=0x3ca07d0)
    at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:74
#2  timed_wait<boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000l> > (wait_duration=..., m=..., 
    this=0x3ca07d0) at src/third_party/boost/boost/thread/pthread/condition_variable_fwd.hpp:72
#3  mongo::CondVarLockGrantNotification::wait (this=0x3ca07a0, timeoutMs=<optimized out>)
    at src/mongo/db/concurrency/lock_state.cpp:229
#4  0x00000000009ab799 in mongo::LockerImpl<true>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
#5  0x000000000099e34c in mongo::Lock::GlobalLock::_lock (this=0x7f5e4465d948, lockMode=mongo::MODE_IX, timeoutMs=4294967295)
    at src/mongo/db/concurrency/d_concurrency.cpp:77
#6  0x000000000099e397 in mongo::Lock::GlobalLock::GlobalLock (this=0x7f5e4465d948, locker=<optimized out>, 
    lockMode=<optimized out>, timeoutMs=<optimized out>) at src/mongo/db/concurrency/d_concurrency.cpp:66
#7  0x000000000099e40c in mongo::Lock::DBLock::DBLock (this=0x7f5e4465d930, locker=0x3ca0200, db=..., mode=<optimized out>)
    at src/mongo/db/concurrency/d_concurrency.cpp:97
#8  0x0000000000a982cd in mongo::IndexBuilder::run (this=0x286db90) at src/mongo/db/index_builder.cpp:93
#9  0x0000000000eee130 in mongo::BackgroundJob::jobBody (this=0x286db90) at src/mongo/util/background.cpp:153
#10 0x0000000000fac904 in boost::(anonymous namespace)::thread_proxy (param=<optimized out>)
    at src/third_party/boost/libs/thread/src/pthread/thread.cpp:121
#11 0x00007f5e4d929182 in start_thread (arg=0x7f5e4465e700) at pthread_create.c:312
#12 0x00007f5e4ca2a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment by Ramon Fernandez Marina [ 20/Nov/15 ]

Thanks for reporting this christian.mueller, we can reproduce this behavior and we're investigating.

Generated at Thu Feb 08 03:57:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.