[SERVER-61119] LDAP pooled timeout test expects deterministic behavior from connection pool Created: 29/Oct/21 Updated: 29/Oct/23 Resolved: 17/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.1.0-rc0, 5.1.0-rc1, 5.1.0-rc2 |
| Fix Version/s: | 5.2.0, 5.1.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Varun Ravichandran | Assignee: | Varun Ravichandran |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.1
|
||||||||
| Sprint: | Security 2021-11-01, Security 2021-11-15, Security 2021-11-29 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 148 | ||||||||
| Description |
|
The long delays inserted in by the failpoints in this test factor into the connection pool's analysis of how healthy its pools of open connections are on Windows but less so on Linux. Since all of the connections in this test go out to the same LDAP server, they belong to the same connection pool. When the connection pool detects that several of the connections to a specific host are frequently timing out, it assumes that the host is down and expires the entire pool so that it is not wasting time spamming a downed host with connection requests. When this occurs, both connections that have hit the failpoint directly and those that have not hit the failpoint are closed by the pool. This sequence of events explains why the test has been occasionally seeing more authentication failures than it expects. The test has no way of deterministically predicting when the connection pool will decide that the LDAP server must be down and will close all connections at will. More importantly, this behavior simply indicates that the connection pool is working as designed and should not be picked up as failures by this test. To fix this, the test should be redesigned to not expect a deterministic number of successes and failures during auth attempts. Rather, it should simply ensure that the server returns some kind of response roughly within the connection pool's configured timeout and does not crash during long hangs. |
| Comments |
| Comment by Githook User [ 03/Dec/21 ] |
|
Author: {'name': 'Varun Ravichandran', 'email': 'varun.ravichandran@mongodb.com', 'username': 'varunravi98'}Message: (cherry picked from commit 524f353544c1f1b867ec74ba0c1406284d186411) |
| Comment by Varun Ravichandran [ 17/Nov/21 ] |
|
`ldap_timeout_pooled.js` and `ldap_timeout_poolless.js` were included in 5.1 as a way to find bugs related to timeout enforcement for hanging LDAP connections. Although the tests caught a couple legitimate bugs that were also fixed as part of 5.1, they expected deterministic behavior in cases where that was not possible, resulting in flakiness (see BF-22963 and BF-23204). `ldap_timeout_pooled.js` also was missing a tag needed to suppress it from running on the Enterprise Amazon Linux 2 arm64 variant, which has the LDAP connection pool disabled. This ticket added that tag in and relaxed the conditions needed for the test to pass in order to reduce flakiness. Since the tests are also in 5.1 and cause false-positive redness there, this should be backported. It is a jstest-only fix. |
| Comment by Githook User [ 17/Nov/21 ] |
|
Author: {'name': 'Varun Ravichandran', 'email': 'varun.ravichandran@mongodb.com', 'username': 'varunravi98'}Message: |