[SERVER-20944] Contention on ThreadPoolTaskExecutor::_mutex Created: 15/Oct/15 Updated: 02/Dec/15 Resolved: 09/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | 3.1.9 |
| Fix Version/s: | 3.2.0-rc3 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Sharding B (10/30/15), QuInt C (11/23/15) | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
The new mongos query path schedules commands to run on the shards using the ThreadPoolTaskExecutor attached to Grid. There is a single instance of the ThreadPoolTaskExecutor per mongos process which synchronizes access to its internal queues through a single mutex. Performance testing has shown that, when there are a sufficient number of concurrent queries executing on mongos, the threads spend much of their time blocked waiting to acquire this lock. Hacking mongos to create a ThreadPoolTaskExecutor per connection thread shows a 3X increase in throughput for a find-by-_id benchRun() workload. For reference, I'm using the following script to generate load on mongos:
|
| Comments |
| Comment by David Storch [ 09/Nov/15 ] |
|
schwerin, agreed. |
| Comment by Andy Schwerin [ 09/Nov/15 ] |
|
david.storch, I propose that we resolve this ticket, now, and do follow on performance work on separate task tickets. |
| Comment by Githook User [ 09/Nov/15 ] |
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: |
| Comment by Githook User [ 09/Nov/15 ] |
|
Author: {u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}Message: This patch moves calls to the NetworkInterface and heap allocations out from |
| Comment by Githook User [ 06/Nov/15 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: |
| Comment by Andy Schwerin [ 27/Oct/15 ] |
|
Taking this back, to focus on shortening critical sections in ThreadPoolTaskExecutor. |
| Comment by Andy Schwerin [ 26/Oct/15 ] |
|
acm, recent evidence suggests (per our discussion on Friday) that the actual contention is inside NetworkInterfaceASIO. If you want to use this ticket to track that side of the investigation, you're welcome to it. Otherwise, maybe close it as Incomplete and we can reopen if it resurfaces as part of the ASIO performance investigation or elsewhere. |
| Comment by David Storch [ 15/Oct/15 ] |
|
Attaching screenshot of vTune Locks and Waits analysis. |