[SERVER-28822] Improve DBConnectionPool's growth semantics Created: 17/Apr/17 Updated: 30/Oct/18 Resolved: 01/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.3, 3.7.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Samantha Ritter (Inactive) | Assignee: | Mira Carey |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v3.6, v3.4, v3.2
|
||||||||||||||||||||||||
| Sprint: | Platforms 2017-09-11, Platforms 2017-10-02, Platforms 2017-12-04, Platforms 2017-12-18, Platforms 2018-01-01, Platforms 2018-01-15, Platforms 2018-02-12 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||
| Description |
|
The DBConnectionPool (legacy connection pool that backs dbclient, used for write operations through 3.5.x) has hard-to-understand and occasionally problematic growth semantics. Our documentation for the knob that controls this pool is correct:
However, it does not explain that connections retained in the pool will remain open in the pool forever, leading to a steady growth in long-lived outgoing server connections, then a plateau once the max pool size is reached, even when there is no load on the system. We could do the following things to improve DBConnectionPool's semantics and offer more control over its growth:
|
| Comments |
| Comment by Githook User [ 09/Feb/18 ] |
|
Author: {'email': 'jcarey@argv.me', 'name': 'Jason Carey', 'username': 'hanumantmk'}Message: (cherry picked from commit 2f1773ed1edb357ff77144481803a9e6df9481de) |
| Comment by Githook User [ 09/Feb/18 ] |
|
Author: {'email': 'samantha.ritter@10gen.com', 'name': 'samantharitter', 'username': 'samantharitter'}Message: (cherry picked from commit 48a34a495386b7cbe18419313768929d12028125) |
| Comment by David Storch [ 07/Feb/18 ] |
|
I'm inclined to agree with Kal here. The query team's roadmap involves agg improvements which will make mapReduce obsolete rather than improvements to mapReduce. |
| Comment by Kaloian Manassiev [ 07/Feb/18 ] |
|
I understand that, but on the other hand it causes us to do throwaway work to modernize M/R, which is pending deprecation in exchange for aggregation features. If we isolate the old connection pool to have only one user - M/R, when we get to removing it, we can just delete all that code at once. We can even consolidate it all in one M/R library so nobody can use it and just delete the entire library when the time comes. |
| Comment by Andrew Morrow (Inactive) [ 07/Feb/18 ] |
|
It will prevent us from removing a pile of old code we don't want to keep around. |
| Comment by Kaloian Manassiev [ 07/Feb/18 ] |
|
Do we need to touch M/R at all? If we fix everything else and leave M/R as the only user of the legacy DBConnectionPool, then I don't think we need to worry about it's growth, because M/R is rarely used and if used is typically from a small number of connections and runs long time. |
| Comment by Esha Maharishi (Inactive) [ 07/Feb/18 ] |
|
I wonder if the above could be a joint query-sharding intern project, e.g. two interns co-mentored by one person from query, one from sharding |
| Comment by Esha Maharishi (Inactive) [ 07/Feb/18 ] |
|
I think it'd be useful to chop away at these one "infrastructure pillar" at a time, to allow deleting big chunks of the infrastructure until it's all gone. For sharding, I think the first goal should be to get rid of uses of ShardConnection, since that allows deleting all the connection versioning code (version_manager.cpp, ShardedConnectionInfo, a bunch of setShardVersion, parts of the ChunkManager). The biggest obstacle for this is the use of ParallelSortClusteredCursor in MapReduceFinishCommand (on the shards), since there is no good replacement for it (e.g., no way for shards to merge streams from other shards over ASIO). The sharded $lookup project would help a lot, by making the ARM usable on shards. In fact, it's a smaller project to make mapReduce use the ARM than to make $lookup use it, since mapReduce does not need to service client getMore's externally. |
| Comment by Esha Maharishi (Inactive) [ 06/Feb/18 ] |
|
Update as of 3.7.1 (~February 2018) for remaining uses of legacy networking (query, replication, sharding). Look at just the bolded words for the major infrastructural pillars supported by legacy networking. uses ShardConnection directly:
uses ShardConnection through PublicGridCommand::passthrough() (these can be all be updated by just updating PublicGridCommand::passthrough()):
uses ShardConnection through ParallelSortClusteredCursor directly:
uses ParallelSortClusteredCursor through Strategy::commandOp() (these can all be updated by just updating Strategy::commandOp()):
uses ScopedDbConnection directly:
uses ScopedDbConnection through cursorCommandPassthrough() (on mongos):
uses ScopedDbConnection through DBClientCursor:
uses DBClientConnection directly:
uses DBClientConnection through ConnectionPool::acquireConnection():
Finally, note that DBClient is still used in mongobridge, benchrun, and the shell. |
| Comment by Githook User [ 01/Feb/18 ] |
|
Author: {'email': 'jcarey@argv.me', 'name': 'Jason Carey', 'username': 'hanumantmk'}Message: |
| Comment by Githook User [ 01/Feb/18 ] |
|
Author: {'email': 'samantha.ritter@10gen.com', 'name': 'samantharitter', 'username': 'samantharitter'}Message: |
| Comment by Githook User [ 02/Jan/18 ] |
|
Author: {'name': 'Jason Carey', 'username': 'hanumantmk', 'email': 'jcarey@argv.me'}Message: Revert " This reverts commit 9e7df79d3907a743e29b7333de6ea08a01f33a05. |
| Comment by Samantha Ritter (Inactive) [ 20/Dec/17 ] |
|
The above commit adds new options to the server:
These options set a maximum number of in-use connections per host:timeout pair, for their respective pools. Previously, there was only a limit on the number of connections that would be stored in the pool. This new parameter controls the number that are allowed to be in existence at any point in time, including ones in-use and checked-out by callers as well as stored in the pool. These both default to infinity (no limit) to preserve the old behavior. If a caller tries to make a new connection and there are already *MaxInUse connections checked out, the calling thread will block until a connection is released back into the pool. If the caller tried to make a connection with a socket timeout, then instead of blocking indefinitely, we will time out and fail to connect after that amount of time.
These flags control the aging-out of old connections in their respective pools. Previously, connections were kept around forever once created. This led to problems where a momentary spike in cluster traffic would lead to a large number of persistent connections that would stick around after the spike, tying up resources. These new parameters set an age limit, in minutes, on connections, and once connections sit idle in a pool for this amount of time, they will be closed. These both default to infinity (no timeout) to preserve the old behavior. |
| Comment by Githook User [ 20/Dec/17 ] |
|
Author: {'name': 'samantharitter', 'email': 'samantha.ritter@10gen.com', 'username': 'samantharitter'}Message: |
| Comment by Esha Maharishi (Inactive) [ 08/Sep/17 ] |
|
andrew.young based on Jason's comment, I'll let the platforms team post the final update |
| Comment by Mira Carey [ 08/Sep/17 ] |
|
Just a note: we'll try to come back through and list what actually got ported for 3.6 (to the task executor) and what didn't as part of wrapping up this ticket. For now, note that the above list is a snapshot from april |
| Comment by Andrew Young [ 08/Sep/17 ] |
|
esha.maharishi, would it be possible to get an updated version of your last comment that shows what changes made it into the release and what places are still using DBClient? |
| Comment by Esha Maharishi (Inactive) [ 19/Apr/17 ] |
|
Here you go! This summarizes uses of DBClient that actually go over the network (i.e., does not include DBDirectClient). Notable paths in mongos/sharding already using TaskExecutor in 3.4:
Notable paths in mongos/sharding that will switch from using DBClient to TaskExecutor in 3.6:
Paths that will continue to use DBClient in 3.6: The motivation was to switch the paths that require shard versioning, to simplify the logic required for Safe Secondary Reads (PM-256). All shard versioning over DBClient was done through ShardConnection. Therefore, uses of DBClient outside ShardConnection will remain. 1) uses of ScopedDbConnection, which can use DBClientConnection or DBClientReplicaSet, depending on the ConnectionString in mongos
in mongod
in both
2) direct uses of DBClientConnection (mostly in replication) in mongod
3) Finally, note that DBClient is also used in mongobridge, benchrun, and the shell. |
| Comment by Esha Maharishi (Inactive) [ 19/Apr/17 ] |
|
Sorry, I got a bit sick and went home early yesterday, but I'm working on this now. |
| Comment by Esha Maharishi (Inactive) [ 17/Apr/17 ] |
|
acm, I can provide an enumerated list. Can I get it to you by EOD tomorrow? |
| Comment by Andrew Morrow (Inactive) [ 17/Apr/17 ] |
|
renctan - Can we get an enumeration of what those exceptions will be? |
| Comment by Randolph Tan [ 17/Apr/17 ] |
|
FYI: in mongos, we always return connections back to the pool after processing a request. esha.maharishi is also currently working to convert most of the uses of ParallelSortClusteredCursor in mongos to AsyncResultsMerger. After the work is complete, almost all of the outgoing connections (with very few exceptions) from mongos will be coming from the asio connection pools. |