[SERVER-63234] Better logging to explain LDAP health check flakiness Created: 02/Feb/22 Updated: 29/Oct/23 Resolved: 03/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0, 4.4.13, 5.0.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andrew Shuvalov (Inactive) | Assignee: | Andrew Shuvalov (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sharding-nyc-subteam2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.0, v4.4
|
||||||||
| Participants: | |||||||||
| Linked BF Score: | 55 | ||||||||
| Description |
|
I want to be sure this flakiness is just timing and not some unknown bug. Adding better logging. |
| Comments |
| Comment by Githook User [ 10/Feb/22 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: |
| Comment by Githook User [ 10/Feb/22 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: (cherry picked from commit 8e7ee133fc847077f512410f92322142ea548705) |
| Comment by Githook User [ 03/Feb/22 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: (cherry picked from commit 8e7ee133fc847077f512410f92322142ea548705) |
| Comment by Githook User [ 03/Feb/22 ] |
|
Author: {'name': 'Andrew Shuvalov', 'email': 'andrew.shuvalov@mongodb.com', 'username': 'shuvalov-mdb'}Message: |
| Comment by Andrew Shuvalov (Inactive) [ 03/Feb/22 ] |
|
Hi ryan.egesdahl in this particular case the network outage is artificially induced by flapping the firewall back and forth but I'm aware of the failures you are describing. I would also like to know what is going on there. Just heads up that what I'm doing here will not address BF-24021 and BF-24049, it's more simple and more narrow. |
| Comment by Ryan Egesdahl (Inactive) [ 02/Feb/22 ] |
|
I have a sneaking suspicion from the fact that the timeouts seem to happen around DNS queries that at least some of it (specifically related to MSVC 2022) might be the fact that IPv6 is not enabled outbound from our build hosts. (See BF-24021 and BF-24049) That kind of thing is probably going to be happening way down in SASL or even the runtime, but anything you can do to illuminate it would be welcome. I didn’t have enough data to do more than make an educated guess. |