[CDRIVER-1055] Unit Tests Under -fsanitize=address Timeout Created: 19/Dec/15  Updated: 18/Oct/16  Resolved: 18/Oct/16

Status: Closed
Project: C Driver
Component/s: tests
Affects Version/s: 1.3.0
Fix Version/s: 1.5.0

Type: Bug Priority: Minor - P4
Reporter: Alex Bishop Assignee: Hannes Magnusson
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux ubuntu 3.16.0-55-generic #74~14.04.1-Ubuntu SMP
cmake version 2.8.12.2
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4


Issue Links:
Related
is related to CDRIVER-747 Test with address sanitizer Closed
is related to CDRIVER-748 Test with ThreadSanitizer Closed
is related to CDRIVER-1051 ASAN/UBSAN: misaligned address Closed

 Description   

Added to /libbson/CMakeLists.txt:

set(CMAKE_C_FLAGS "-fsanitize=address")

cmake .
make
./test-libbson

Assuming PR #146 and PR #147

The unit tests will more often than not stall and abort due to a timeout. Given a lucky ordering of the tests from threading, they all pass.

./test-libbson
...
Timed out, aborting!
Aborted (core dumped)

Running the unit tests without parallelization are good.

./test-libbson -p



 Comments   
Comment by Hannes Magnusson [ 18/Oct/16 ]

We've removed the threading argument from the test suite as it just didn't work.

Comment by Alex Bishop [ 27/Dec/15 ]

No problem and understood. This was more for my own curiosity and completeness, had free time from the holidays. We run your unit tests along side all others during our build process and using '-p' is acceptable.

I didn't run w/o ASAN ever and would be surprised if it were still prevalent. ASAN might be a helpful catalyst for debugging.

Comment by A. Jesse Jiryu Davis [ 27/Dec/15 ]

Thanks for the report and helping us diagnose. As Hannes said, there's problems with the test suite's concurrency in the default mode, even without ASAN. We'll follow up on all these issues in the new year.

Comment by Alex Bishop [ 25/Dec/15 ]

I think the root cause is that non-trivial work is performed in a forking thread. My investigations lead me to believe a memory lock is deadlocked. Each fork strips out all but the calling thread which will leave one in a bad way with respect to critical regions.

  • Memory/heap routines are not async-signal-safe
  • exit() is not thread safe
  • fork() uses signals

One will find this deadlock occurs most often during /bson/writer/shared_buffer since it's the slowest and perhaps the most malloc happy. Slow enough to allow two threads to interact.

The regular state of affairs for the backtrace below:

  • Main @ sleep()
    • Parent @ waitpid()
    • Child @ BlockingMutex::Lock

#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f7298eed053 in __sanitizer::BlockingMutex::Lock (this=this@entry=0x610000002010) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_linux.cc:529
#2  0x00007f7298edbf4a in GenericScopedLock (mu=mu@entry=0x610000002010, this=<synthetic pointer>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_mutex.h:83
#3  __sanitizer::SizeClassAllocator64<105553116266496ul, 1099511627776ul, 0ul, __sanitizer::SizeClassMap<17ul, 256ul, 16ul, 28ul>, __asan::AsanMapUnmapCallback>::PopulateFreeList (this=0x7f7299154aa0 <__asan::allocator>, stat=0x7f7297519710, c=0x7f72974c0180, class_id=57, region=region@entry=0x610000002010)at ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator.h:483
#4  0x00007f7298edc468 in __sanitizer::SizeClassAllocator64<105553116266496ul, 1099511627776ul, 0ul, __sanitizer::SizeClassMap<17ul, 256ul, 16ul, 28ul>, _asan::AsanMapUnmapCallback>::AllocateBatch (this=this@entry=0x7f7299154aa0 <__asan::allocator>, stat=stat@entry=0x7f7297519710, c=c@entry=0x7f72974c0180, class_id=class_id@entry=57) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator.h:335
#5  0x00007f7298edc4d5 in __sanitizer::SizeClassAllocatorLocalCache<__sanitizer::SizeClassAllocator64<105553116266496ul, 1099511627776ul, 0ul, __sanitizer::SizeClassMap<17ul, 256ul, 16ul, 28ul>, __asan::AsanMapUnmapCallback> >::Refill (this=this@entry=0x7f72974c0180, allocator=allocator@entry=0x7f7299154aa0 <__asan::allocator>, class_id=class_id@entry=57) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator.h:829
#6  0x00007f7298edcb1b in Allocate (class_id=57, allocator=0x7f7299154aa0 <__asan::allocator>, this=this@entry=0x7f72974c0180) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator.h:782
#7  Allocate (cleared=false, alignment=1, size=8216, cache=cache@entry=0x7f72974c0180, this=0x7f7299154aa0 <__asan::allocator>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_allocator.h:1055
#8  Allocate (size=8216, this=<synthetic pointer>) at ../../../../src/libsanitizer/asan/asan_allocator2.cc:283
#9  __sanitizer::QuarantineCache<__asan::QuarantineCallback>::AllocBatch (this=this@entry=0x7f72974c0100, cb=..., cb@entry=...) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_quarantine.h:161
#10 0x00007f7298ed9fab in Enqueue (size=256, ptr=0x60240001fcf0, cb=..., this=0x7f72974c0100) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_quarantine.h:125
#11 Put (size=256, ptr=0x60240001fcf0, cb=..., c=0x7f72974c0100, this=0x7f72991549a0 <__asan::quarantine>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_quarantine.h:54
#12 __asan::Deallocate (ptr=ptr@entry=0x60240001fd00, stack=stack@entry=0x7f72974be120, alloc_type=alloc_type@entry=__asan::FROM_MALLOC) at ../../../../src/libsanitizer/asan/asan_allocator2.cc:462
#13 0x00007f7298edab05 in __asan::asan_free (ptr=ptr@entry=0x60240001fd00, stack=stack@entry=0x7f72974be120, alloc_type=alloc_type@entry=__asan::FROM_MALLOC) at ../../../../src/libsanitizer/asan/asan_allocator2.cc:594
#14 0x00007f7298ee7368 in __interceptor_free (ptr=0x60240001fd00) at ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:62
#15 0x0000000000463347 in bson_free (mem=0x60240001fd00) at /libbson/src/bson/bson-memory.c:224

Same with gcc & clang. Was using libbson tag 1.1.9 to avoid cmake incompatibility, 1.3.0 will be same same.

A solution might be a test-runner able to be called with exec in the child fork. However, the context mechanics, seed and other shared tooling will likely be an issue. It might be wise to just fork or thread, IMHO.

Also, when running with tsan, there is a read data race upon test->count upon printing out the json test results.

Comment by Hannes Magnusson [ 21/Dec/15 ]

We are aware of running the test suite without -p doesn't really work, but we believe that is a problem with the test framework itself, not the driver.

Generated at Wed Feb 07 21:11:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.