[CDRIVER-3637] Running test-libmongoc with --no-fork can crash on Windows with WinSSL Created: 27/Apr/20  Updated: 10/Feb/23

Status: Backlog
Project: C Driver
Component/s: tls
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kevin Albertson Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: platform-problems
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Epic Link: CDRIVER-4575

 Description   

While working on CDRIVER-3535, I disabled forking to get more verbose output in evergreen patch builds. That uncovered frequent crashes in Windows 2017 variants.

The message does not give much information:

[2020/04/24 20:22:53.162] .evergreen/run-tests.sh: line 90:  4106 Segmentation fault      ./src/libmongoc/Debug/test-libmongoc.exe $TEST_ARGS --no-fork -d
[2020/04/24 20:22:53.170] Begin /retryable_writes/bulk_tracks_new_server, seed 1587759684
[2020/04/24 20:22:53.170] Command failed: command encountered problem: error waiting on process '8bfa5265-7946-4619-83a4-9a4905995626': exit status 139
[2020/04/24 20:22:53.170] Task completed - FAILURE.

Observations:

  • the crash is inconsistent but frequent (I saw it at least once almost every patch build)
  • this has only crashed on winssl variants
  • the crash does not seem to occur after any one specific test, but generally after 5-7 minutes of execution

I attempted to obtain a minidump to no avail by following these instructions https://docs.microsoft.com/en-us/windows/win32/wer/collecting-user-mode-dumps
(I can do so locally when running with cmd.exe, but not with the cygwin terminal).

I managed to print a stack trace by hooking onto the exception with SetUnhandledExceptionFilter, which printed this:

[2020/04/24 20:02:53.742] begin stack trace
[2020/04/24 20:02:53.742] Ordinal1001
[2020/04/24 20:02:53.744] BaseThreadInitThunk
[2020/04/24 20:02:53.744] RtlUserThreadStart
[2020/04/24 20:02:53.766] end stack trace

After some more digging, I thought issues with our thread primitives on Windows could be the culprit (CDRIVER-3634 or CDRIVER-3635), but attempted fixes to those issues did not resolve the crash.


Generated at Wed Feb 07 21:18:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.