-
Type: Task
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Atlas Streams
-
Sprint 58
We frequently observe failures in prod smoke tests like below:
- "$merge to immortal-aws-production-virginia-usa-SP30-kanopy-data.output-kanopy_immortal_smoke_test_221b96fc_ab848ab2 failed: Command update requires authentication: generic server error"
- Change stream $source immortal-aws-production-london-gbr-SP30-kanopy-data.input-kanopy_immortal_smoke_test_221b96fc_59e45b29 failed: Command aggregate requires authentication: generic server error
This tends to happen in our "immortal" smoke test processors. These run forever. They are inactive for 5/10 minutes at a time and then wake up to read/write data.
- Are the certs being correctly rotation on the local disk?
- Is mongocxx using the updated cert after rotation?
- Do we need to configure a retry knob on mongocxx? Or do we need to add retry logic in our code?
- is related to
-
SERVER-94877 Categorize transient "requires authentication" errors as internal
- Closed