-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: tls
While working on CDRIVER-3535, I disabled forking to get more verbose output in evergreen patch builds. That uncovered frequent crashes in Windows 2017 variants.
The message does not give much information:
[2020/04/24 20:22:53.162] .evergreen/run-tests.sh: line 90: 4106 Segmentation fault ./src/libmongoc/Debug/test-libmongoc.exe $TEST_ARGS --no-fork -d [2020/04/24 20:22:53.170] Begin /retryable_writes/bulk_tracks_new_server, seed 1587759684 [2020/04/24 20:22:53.170] Command failed: command encountered problem: error waiting on process '8bfa5265-7946-4619-83a4-9a4905995626': exit status 139 [2020/04/24 20:22:53.170] Task completed - FAILURE.
Observations:
- the crash is inconsistent but frequent (I saw it at least once almost every patch build)
- this has only crashed on winssl variants
- the crash does not seem to occur after any one specific test, but generally after 5-7 minutes of execution
I attempted to obtain a minidump to no avail by following these instructions https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps
(I can do so locally when running with cmd.exe, but not with the cygwin terminal).
I managed to print a stack trace by hooking onto the exception with SetUnhandledExceptionFilter, which printed this:
[2020/04/24 20:02:53.742] begin stack trace [2020/04/24 20:02:53.742] Ordinal1001 [2020/04/24 20:02:53.744] BaseThreadInitThunk [2020/04/24 20:02:53.744] RtlUserThreadStart [2020/04/24 20:02:53.766] end stack trace
After some more digging, I thought issues with our thread primitives on Windows could be the culprit (CDRIVER-3634 or CDRIVER-3635), but attempted fixes to those issues did not resolve the crash.