[SERVER-20944] Contention on ThreadPoolTaskExecutor::_mutex Created: 15/Oct/15  Updated: 02/Dec/15  Resolved: 09/Nov/15

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: 3.1.9
Fix Version/s: 3.2.0-rc3

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File vtune_locks_and_waits.png    
Issue Links:
Related
related to SERVER-20596 Performance regression in new mongos ... Closed
related to SERVER-21597 Fix connPoolStats command to work wit... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding B (10/30/15), QuInt C (11/23/15)
Participants:

 Description   

The new mongos query path schedules commands to run on the shards using the ThreadPoolTaskExecutor attached to Grid. There is a single instance of the ThreadPoolTaskExecutor per mongos process which synchronizes access to its internal queues through a single mutex. Performance testing has shown that, when there are a sufficient number of concurrent queries executing on mongos, the threads spend much of their time blocked waiting to acquire this lock. Hacking mongos to create a ThreadPoolTaskExecutor per connection thread shows a 3X increase in throughput for a find-by-_id benchRun() workload.

For reference, I'm using the following script to generate load on mongos:

#!/bin/bash
 
THREADS=$1
BENCH_SECONDS=$2
RPC=opQueryOnly
PORT=20004
 
./mongo --port $PORT --quiet --eval 'db.foo.insert({_id:1})'
 
./mongo --port $PORT --rpcProtocols $RPC --eval '
ops = [{op:"find", ns:"test.foo", query: {_id: 1}}]
results = benchRun({ops:ops, parallel:'$THREADS', seconds:'$BENCH_SECONDS', host:"localhost:'$PORT'"})
print(Math.round(results["totalOps/s"]))
'



 Comments   
Comment by David Storch [ 09/Nov/15 ]

schwerin, agreed.

Comment by Andy Schwerin [ 09/Nov/15 ]

david.storch, I propose that we resolve this ticket, now, and do follow on performance work on separate task tickets.

Comment by Githook User [ 09/Nov/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-20944 Replace Event on CallbackState with condition variable.
Branch: master
https://github.com/mongodb/mongo/commit/516a702d8fe87338d8b5e10ab6f2c4939ccbd5f8

Comment by Githook User [ 09/Nov/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-20944 Move as much work as possible outside of ThreadPoolTaskExecutor critical sections.

This patch moves calls to the NetworkInterface and heap allocations out from
under the ThreadPoolTaskExecutor's mutex, to minimize critical length and
maximize available concurrency.
Branch: master
https://github.com/mongodb/mongo/commit/fa038b8375fd5aea4d359fe0968beb948de58782

Comment by Githook User [ 06/Nov/15 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-20944 distribute mongos work across multiple TaskExecutors
Branch: master
https://github.com/mongodb/mongo/commit/b9838ca50e1e5518d9005737ae5d25683acc20cc

Comment by Andy Schwerin [ 27/Oct/15 ]

Taking this back, to focus on shortening critical sections in ThreadPoolTaskExecutor.

Comment by Andy Schwerin [ 26/Oct/15 ]

acm, recent evidence suggests (per our discussion on Friday) that the actual contention is inside NetworkInterfaceASIO. If you want to use this ticket to track that side of the investigation, you're welcome to it. Otherwise, maybe close it as Incomplete and we can reopen if it resurfaces as part of the ASIO performance investigation or elsewhere.

Comment by David Storch [ 15/Oct/15 ]

Attaching screenshot of vTune Locks and Waits analysis.

Generated at Thu Feb 08 03:55:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.