[SERVER-21108] connection pool should check if a connection has been closed remotely before using it Created: 22/Oct/15 Updated: 17/Nov/15 Resolved: 12/Nov/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.0-rc3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Scott Hernandez (Inactive) | Assignee: | Adam Midvidy |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Platform C (11/20/15) | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
| Comments |
| Comment by Githook User [ 12/Nov/15 ] | ||||||||||||||
|
Author: {u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}Message: | ||||||||||||||
| Comment by Githook User [ 12/Nov/15 ] | ||||||||||||||
|
Author: {u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}Message: | ||||||||||||||
| Comment by Githook User [ 10/Nov/15 ] | ||||||||||||||
|
Author: {u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}Message: | ||||||||||||||
| Comment by Githook User [ 10/Nov/15 ] | ||||||||||||||
|
Author: {u'username': u'amidvidy', u'name': u'Adam Midvidy', u'email': u'amidvidy@gmail.com'}Message: | ||||||||||||||
| Comment by Eric Milkie [ 02/Nov/15 ] | ||||||||||||||
|
There should be no retrying anywhere, as part of the implementation of this ticket. The old pooled connection code did not retry and neither should the new code. | ||||||||||||||
| Comment by Adam Midvidy [ 02/Nov/15 ] | ||||||||||||||
|
We don't believe that this issue should be handled in the network layer, as not all commands we execute are idempotent (and thus safe to retry). | ||||||||||||||
| Comment by Benety Goh [ 26/Oct/15 ] | ||||||||||||||
|
The commit b82f1a2 fixes a symptom of the underlying issue (bad connections) in a sharding test. The underlying issue still needs to be looked at. | ||||||||||||||
| Comment by Githook User [ 26/Oct/15 ] | ||||||||||||||
|
Author: {u'username': u'benety', u'name': u'Benety Goh', u'email': u'benety@mongodb.com'}Message: | ||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 23/Oct/15 ] | ||||||||||||||
|
We also need to add tests to ensure dead connections do not fail up-level operation run thought the executor/network-interface, like sharding and replication use. | ||||||||||||||
| Comment by Benety Goh [ 23/Oct/15 ] | ||||||||||||||
|
The legacy socket code in Socket::isStillConnected() used to perform a number of checks to determine if the connection in the pool is still usable: | ||||||||||||||
| Comment by Benety Goh [ 23/Oct/15 ] | ||||||||||||||
|
The primary seems to be using a bad connection in its pool to send a heartbeat to the node that recently stepped down (and closed incoming connections):
| ||||||||||||||
| Comment by Spencer Brody (Inactive) [ 23/Oct/15 ] | ||||||||||||||
|
This looks to be strictly replication related, I don't think sharding is a factor. The failure is due to one node not responding during the reconfig quorum check. |