[SERVER-68197] Expose socket option setting failure diagnostics in more cases Created: 21/Jul/22  Updated: 05/Dec/22

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Service Arch
Participants:
Linked BF Score: 3

 Description   

We instrumented better socket option setting diagnostics in May 2021. These diagnostics however are only seen with a debug verbosity of 3 or higher. This means that they don't get shown on any of our Evergreen testing variants, since none of them expose networking logs at that high of a level.

There are a few different solutions we could try:

Always expose expanded socket option setting diagnostics in exceptions passed up from the networking layer.

  • Pro: Socket option failures seen in customer cases will have more verbose diagnostics, allowing us to connect more easily customer failures to existing cases.
  • Con: Each socket failure that will ever be logged will receive an addition 20-30 characters. We may be exposing extraneous and confusing information to customers.
  • Amount of change: Medium

Always log socket failures in kDebugMode (aka building with debug mode) in addition to existing logging logic

  • Pro: Allows us to see socket option failure information in testing.
  • Con: Doesn't assist with customer cases.
  • Amount of change: Low

Lower the logging threshold to verbosity level 2, allowing logs to be caught in some testing suites that set networking logs to that verbosity.

  • Pro: Allows us to see socket option failure information in testing.
  • Con: Doesn't assist with customer cases. Can be inconsistent and reliable on the not-always-guaranteed constraint of testing suites that have a heightened networking verbosity level.
  • Amount of change: Low

I'm leaning towards the second solution, as it changes behavior in debug mode only, and allows us the most reliable path to diagnosing further issues.


Generated at Thu Feb 08 06:10:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.