[MONGOCRYPT-241] libmongocrypt uses asserts and terminates application processes Created: 31/Jan/20 Updated: 30/Mar/22 |
|
| Status: | Backlog |
| Project: | Libmongocrypt |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Oleg Pudeyev (Inactive) | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
While running FLE tests in the Ruby driver I received this output:
It appears that libmongocrypt calls BSON_ASSERT which in turn calls abort() when the value evaluates to zero. In this case my entire Ruby process was destroyed without a stack trace. BSON_ASSERT provided file & line number for the failing assertion but no stack trace leading up to the failing assertion. If the code segfaulted the Ruby runtime would have provided a stack trace including both Ruby and the C functions up to the point of the segfault (see for example https://github.com/p-mongo/tests/blob/master/ruby-ext-stack-trace/output-ffi-2.7.log#L27). Because the failure did not produce a Ruby stack trace, it is very challenging to determine where one should start debugging this failure, since I would need to identify the Ruby code that is failing (or causing libmongocrypt to fail). If such a failure occurs in a typical Ruby web application, it would essentially perform a DOS attack on the application as the entire process is terminated. I think libmongocrypt should: 1. Perform all required checks including internal state assertions in a way that is part of normal method flow, i.e. not conditioned on DEBUG-like preprocessor defines being enabled; 2. Propagate errors detected in all of the above checks to the calling driver/application, with appropriate information to diagnose the source of the error; 3. Not call abort or otherwise terminate the running process, but instead fail the individual operation being performed with an appropriate failure status code and/or diagnostics. |