[SERVER-19947] Deadlock between aggregations with sorts on sharded collections and operations requiring MODE_X database locks Created: 12/Aug/15 Updated: 19/Sep/15 Resolved: 17/Aug/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Querying |
| Affects Version/s: | None |
| Fix Version/s: | 3.1.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Atul Kachru | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | ./buildscripts/resmoke.py --executor=aggregation --storageEngine=mmapv1 --log=file jstests/aggregation/testshard1.js --repeat=100 tail -f executor.log, and watch for the test to have hung. On our Solaris builders, it takes about 4 minutes to run the test, and if a test is still running after 6, it's hung. |
||||||||||||||||
| Sprint: | QuInt 8 08/28/15 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
This is a deadlock that occurs when a shard is trying to receive a chunk while also acting as the merge node for an aggregation sort. There are three threads in one process involved. Thread 1 is servicing a getmore request for an aggregation pipeline. A relevant portion of its stack frame is below:
We can see that thread 1 is blocked because shouldSaveGetMore has asked its pipeline to answer the predicate isEOF, and the PipelineProxy stage has turned that into a getNext(), which in turn causes the sorter to fetch data on one of its remote cursors. At this point, thread 1 holds the database lock for the source of the aggregation in MODE_IS. Notably, the "remote cursor" is actually in the same process; the stack frames near the top are engaging the network layer to communicate with this very process on another thread. At this point, thread 2 starts the migrate thread driver run by the recipients of new chunks. The other shard in this cluster (not pictured) has started donated a new chunk to this node, coincidentally on the same database as the one the agg is running on. Thread 2 blocks trying to acquire the database lock in MODE_X, because thread 1 has it in MODE_IS. Stack trace is below.
Now, thread 3 starts trying to process the getmore requested issued to it from thread 1. It attempts to acquire the same database lock as threads 1 and 2 in MODE_IS. However, since thread 2 is waiting in MODE_X, it prevents any new threads from acquiring in MODE_IS. We are now deadlocked. Here's thread 3's stack:
Relevant build failures: |
| Comments |
| Comment by Githook User [ 17/Aug/15 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: |
| Comment by David Storch [ 17/Aug/15 ] |
|
We've had this bug before: it was fixed under |
| Comment by Andy Schwerin [ 13/Aug/15 ] |
|
This is exercised regularly by our Solaris builder in Evergreen, but nothing about this bug is specific to Solaris. |
| Comment by Andy Schwerin [ 13/Aug/15 ] |
|
I believe the true problem here is that thread 1 in the description should not be doing network work inside of isEOF. It's extra bad that it's calling itself, but going to the network while holding a lock is pretty much forbidden. In addition to the risk of distributed deadlock, it can prevent other threads from getting locks in a timely manner. |