[SERVER-26859] AsyncResultsMerger replica set retargeting may block the ASIO callback threads Created: 01/Nov/16  Updated: 11/Apr/17  Resolved: 08/Nov/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.10, 3.4.0-rc2
Fix Version/s: 3.2.11, 3.4.0-rc3

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-26654 ExceededTimeLimit: Operation timed ou... Closed
related to SERVER-26722 router blocks and throws ExceededTime... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Sharding 2016-11-21
Participants:
Case:

 Description   

The AsyncResultsMerger performs retargeting on network or replication NotMaster errors, which occur during the initial cursor establishment.

This retargeting is blocking and may happen on an ASIO callback thread and thus block it from processing other events, such as finishing connection establishment. This in turn can lead to connections unrelated to the request which triggered retargeting to become wrongly labeled as timed-out. The end effect of this is requests failing with an error of "ExceededTimeLimit: Operation timed out".

What exacerbates this problem is that ASIO will throw out the entire pool for a host with timed-out connections, which will cause new connections to be opened.



 Comments   
Comment by Githook User [ 08/Nov/16 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-26859 AsyncResultsMerger replica set retargeting may block the ASIO callback threads

When the handleResponse callback encounters a retriable error. Signal the merger thread for it to retry instead of trying to reschedule inline since rescheduling involves re-evaluating the target host which is a blocking operation.

(cherry picked from commit 5b2134f4ae4ea2d70b0ce89041fd11fd7810e40d)

Conflicts:
src/mongo/s/query/async_results_merger.cpp
src/mongo/s/query/async_results_merger_test.cpp
Branch: v3.2
https://github.com/mongodb/mongo/commit/102f68907ecad28cf8ed479bee61c3afd1a4f0f5

Comment by Githook User [ 08/Nov/16 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-26859 AsyncResultsMerger replica set retargeting may block the ASIO callback threads

When the handleResponse callback encounters a retriable error. Signal the merger thread for it to retry instead of trying to reschedule inline since rescheduling involves re-evaluating the target host which is a blocking operation.
Branch: master
https://github.com/mongodb/mongo/commit/5b2134f4ae4ea2d70b0ce89041fd11fd7810e40d

Comment by Jon Hyman [ 06/Nov/16 ]

Once this is backported, can you please release 3.2.11 asap? We're stuck dealing with segfaults (SERVER-25465) because we upgraded to 3.2.10 but ran into SERVER-26654 which was worse, but we're still not in a good place. Thanks.

Comment by Kaloian Manassiev [ 01/Nov/16 ]

Making the ReplicaSetMonitor asynchronous is a significant task and doing this resolution on a separate thread may instantiate an unbounded number of threads in the system. The least disruptive change would be to instead signal the AsyncResultsMerger's work available event, without returning any results and get the user thread to perform the blocking search for read preference.

Generated at Thu Feb 08 04:13:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.