[SERVER-19947] Deadlock between aggregations with sorts on sharded collections and operations requiring MODE_X database locks Created: 12/Aug/15  Updated: 19/Sep/15  Resolved: 17/Aug/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Querying
Affects Version/s: None
Fix Version/s: 3.1.7

Type: Bug Priority: Major - P3
Reporter: Atul Kachru Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-18203 Aggregation distributed deadlock with... Closed
is related to SERVER-17839 Remove PlanStage::isEOF Backlog
is related to SERVER-18841 Make awaitData cursors block rather t... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

./buildscripts/resmoke.py --executor=aggregation --storageEngine=mmapv1 --log=file jstests/aggregation/testshard1.js --repeat=100

tail -f executor.log, and watch for the test to have hung. On our Solaris builders, it takes about 4 minutes to run the test, and if a test is still running after 6, it's hung.

Sprint: QuInt 8 08/28/15
Participants:

 Description   

This is a deadlock that occurs when a shard is trying to receive a chunk while also acting as the merge node for an aggregation sort. There are three threads in one process involved.

Thread 1 is servicing a getmore request for an aggregation pipeline. A relevant portion of its stack frame is below:

#7  0x000000000159bf52 in mongo::MessagingPort::recv (this=0x6f5fe40, toSend=..., response=...) at src/mongo/util/net/message_port.cpp:227
#8  0x000000000159c2e5 in mongo::MessagingPort::call (this=<optimized out>, toSend=..., response=...) at src/mongo/util/net/message_port.cpp:222
#9  0x0000000000d7921f in mongo::DBClientConnection::call (this=0x3c77c20, toSend=..., response=..., assertOk=<optimized out>, 
    actualServer=<optimized out>) at src/mongo/client/dbclient.cpp:1478
#10 0x0000000000d8b7dc in mongo::DBClientCursor::requestMore (this=this@entry=0x3bcc648) at src/mongo/client/dbclientcursor.cpp:210
#11 0x0000000000d8bcc7 in mongo::DBClientCursor::more (this=0x3bcc648) at src/mongo/client/dbclientcursor.cpp:341
#12 0x00000000010adf73 in mongo::sorter::MergeIterator<mongo::Value, mongo::Document, mongo::DocumentSourceSort::Comparator>::next() ()
#13 0x00000000010aa611 in mongo::DocumentSourceSort::getNext (this=<optimized out>) at src/mongo/db/pipeline/document_source_sort.cpp:71
#14 0x0000000000f59740 in mongo::PipelineProxyStage::getNextBson (this=this@entry=0x3c75840) at src/mongo/db/exec/pipeline_proxy.cpp:131
#15 0x0000000000f5a34f in mongo::PipelineProxyStage::isEOF (this=0x3c75840) at src/mongo/db/exec/pipeline_proxy.cpp:87
#16 0x0000000001107e84 in mongo::shouldSaveCursorGetMore (finalState=<optimized out>, exec=exec@entry=0x3c75920, isTailable=<optimized out>)
    at src/mongo/db/query/find.cpp:122
#17 0x0000000001108e6d in mongo::getMore (txn=txn@entry=0x7fffe7a02bd0, ns=ns@entry=0x3c7b5e4 "aggShard.server9444", ntoreturn=ntoreturn@entry=0, 
    cursorid=cursorid@entry=35834991707, exhaust=exhaust@entry=0x7fffe7a01fbd, isCursorAuthorized=isCursorAuthorized@entry=0x7fffe7a01fbe)
    at src/mongo/db/query/find.cpp:414

We can see that thread 1 is blocked because shouldSaveGetMore has asked its pipeline to answer the predicate isEOF, and the PipelineProxy stage has turned that into a getNext(), which in turn causes the sorter to fetch data on one of its remote cursors. At this point, thread 1 holds the database lock for the source of the aggregation in MODE_IS. Notably, the "remote cursor" is actually in the same process; the stack frames near the top are engaging the network layer to communicate with this very process on another thread.

At this point, thread 2 starts the migrate thread driver run by the recipients of new chunks. The other shard in this cluster (not pictured) has started donated a new chunk to this node, coincidentally on the same database as the one the agg is running on. Thread 2 blocks trying to acquire the database lock in MODE_X, because thread 1 has it in MODE_IS. Stack trace is below.

#13 0x0000000000f00cf7 in mongo::Lock::DBLock::DBLock (this=0x7fffd7602c10, locker=0x3c7b9e0, db=..., mode=<optimized out>)
    at src/mongo/db/concurrency/d_concurrency.cpp:110
#14 0x00000000012970f1 in mongo::MigrationDestinationManager::_migrateDriver (
    this=this@entry=0x1ffbbc0 <mongo::(anonymous namespace)::migrateDestManager>, txn=txn@entry=0x7fffd7602e10, ns=..., min=..., max=..., 
    shardKeyPattern=..., fromShard=..., epoch=..., writeConcern=...) at src/mongo/db/s/migration_destination_manager.cpp:434

Now, thread 3 starts trying to process the getmore requested issued to it from thread 1. It attempts to acquire the same database lock as threads 1 and 2 in MODE_IS. However, since thread 2 is waiting in MODE_X, it prevents any new threads from acquiring in MODE_IS. We are now deadlocked. Here's thread 3's stack:

#13 0x0000000000f00cf7 in mongo::Lock::DBLock::DBLock (this=0x648b430, locker=0x3bf1e00, db=..., mode=<optimized out>)
    at src/mongo/db/concurrency/d_concurrency.cpp:110
#14 0x0000000001108e12 in make_unique<mongo::Lock::DBLock, mongo::Locker*, mongo::StringData, mongo::LockMode> ()
    at src/third_party/boost-1.56.0/boost/smart_ptr/make_unique_object.hpp:28
#15 mongo::getMore (txn=txn@entry=0x7ffff2ffbbd0, ns=ns@entry=0x5cf1084 "aggShard.server9444", ntoreturn=ntoreturn@entry=0, 
    cursorid=cursorid@entry=36214372988, exhaust=exhaust@entry=0x7ffff2ffafbd, isCursorAuthorized=isCursorAuthorized@entry=0x7ffff2ffafbe)
    at src/mongo/db/query/find.cpp:404

Relevant build failures:
https://evergreen.mongodb.com/task/mongodb_mongo_master_solaris_64_bit_aggregation_aa82208d53a4b642441adbadd4bd0595c4fa40ae_15_08_11_21_38_16

https://evergreen.mongodb.com/task/mongodb_mongo_master_solaris_64_bit_aggregation_8e2990da71c1498d2a979ef47c07c2bfc2fa4f97_15_08_11_15_05_28

https://evergreen.mongodb.com/task/mongodb_mongo_master_solaris_64_bit_aggregation_a067d03eca22d67a1ca036ea61544cd51df1490b_15_08_10_21_57_24

https://evergreen.mongodb.com/task/mongodb_mongo_master_solaris_64_bit_aggregation_55f6c3a09f96aabe7c933d9262cf011c7776fd60_15_08_10_12_56_20



 Comments   
Comment by Githook User [ 17/Aug/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-19947 fix distributed deadlock by preventing agg cursors from making a network call from inside the collection lock
Branch: master
https://github.com/mongodb/mongo/commit/b4f7738e808361c3edc0c040d8eb560b7f6f1526

Comment by David Storch [ 17/Aug/15 ]

We've had this bug before: it was fixed under SERVER-18203, and then regressed again in SERVER-18841. This makes me think that we should consider scheduling SERVER-17839: this work would offer a much more stable fix that is unlikely to be broken by unrelated refactoring.

Comment by Andy Schwerin [ 13/Aug/15 ]

This is exercised regularly by our Solaris builder in Evergreen, but nothing about this bug is specific to Solaris.

Comment by Andy Schwerin [ 13/Aug/15 ]

I believe the true problem here is that thread 1 in the description should not be doing network work inside of isEOF. It's extra bad that it's calling itself, but going to the network while holding a lock is pretty much forbidden. In addition to the risk of distributed deadlock, it can prevent other threads from getting locks in a timely manner.

Generated at Thu Feb 08 03:52:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.