[SERVER-73168] diminish the ServiceExecutor benchmark under ASAN Created: 21/Jan/23  Updated: 27/Oct/23  Resolved: 07/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Billy Donahue Assignee: Billy Donahue
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
is caused by SERVER-69570 ServiceExecutorSynchronous inefficien... Closed
Operating System: ALL
Backport Requested:
v6.2
Sprint: Service Arch 2023-01-23, Service Arch 2023-02-06, Service Arch 2023-02-20
Participants:
Linked BF Score: 21

 Description   

Since the service_executor_bm benchmark was introduced in SERVER-69570, it has been having trouble running under ASAN, and so the associated BF-26477 has become a nuisance on v6.2 and on master branches.

This comment on that BF explains the situation.

Under ASAN, the benchmark has trouble allocating threads quickly.
Unfortunately it needs lots of threads to test the cost of creating them.

[benchmark_test:service_executor_bm] ==127545==ERROR: AddressSanitizer failed to allocate 0xe000 (57344) bytes of Create (error code: 12)
[benchmark_test:service_executor_bm] ERROR: Failed to mmap

ASAN builds are extremely slow, and they don't generate useful benchmark results. We would run a benchmark under ASAN to find correctness issues in the benchmark, but the benchmark's generated numbers aren't representative of production and not really useful.

I think it would be prudent to have the benchmark's code detect ASAN with a preprocessor ifdef. If it's an ASAN build, it would run the benchmark loop once instead of the full amount that the benchmark library specifies. This should fix the ASAN mapping problem as we won't be generating an inordinate number of threads anymore.

This small tweak to the benchmark should be backported to v6.2 as well.



 Comments   
Comment by Billy Donahue [ 07/Feb/23 ]

The BF that prompted this ticket hasn't seen a recurrence on master branch since November and we've changed the relevant code and target platform since then.

I think we can just put this one down with no further action.

The session_workflow_bm seems to be running successfully on relevant buildvariants.
It hasn't been bypassed or disabled. It's running and succeeding now.

Comment by Billy Donahue [ 28/Jan/23 ]

> In ASAN we can just have the whole benchmark ifdef away.

That also failed because resmoke fails if there are no benchmarks to run.
https://parsley.mongodb.com/evergreen/mongodb_mongo_master_rhel80_debug_asan_benchmarks_orphaned_patch_4e1beba356e790683f4640820684ff8f38b7c51d_63d2ca80c9ec44150d4678b3_23_01_26_18_54_50/0/task?bookmarks=0,393,429,690

Also tried defining a "Dummy" benchmark to work around this. That also failed for resmoke-related reasons.
https://parsley.mongodb.com/evergreen/mongodb_mongo_master_rhel80_debug_asan_benchmarks_orphaned_patch_4e1beba356e790683f4640820684ff8f38b7c51d_63d2e48b3627e02628627133_23_01_26_20_38_17/0/task?bookmarks=0,409,479,719

Comment by Billy Donahue [ 26/Jan/23 ]

I'm seeing hangs in the benchmark even with a pretty conservative patch to suppress the looping under ASAN. There may even be a bug in the benchmark library happening here.

https://spruce.mongodb.com/task/mongodb_mongo_master_rhel80_debug_asan_benchmarks_orphaned_patch_4e1beba356e790683f4640820684ff8f38b7c51d_63d238960305b92319ff5d22_23_01_26_08_24_17/tests?execution=0&sortBy=STATUS&sortDir=ASC

I tried 3 different ways to do this and ended up with something that hangs, produces infinite logs, and that I can't locally reproduce. I feel this ticket isn't worth more investigation. In ASAN we can just have the whole benchmark ifdef away.

Comment by Billy Donahue [ 25/Jan/23 ]

Needs another tweak. Last commit failed on an ASAN builders with DEBUG on.
Apparently Google Benchmark library fails an assertion in DEBUG mode if you break out early.
So I'll have to just change the test short circuiting to be an early continue instead of an early break.

Comment by Githook User [ 25/Jan/23 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-73168 ServiceExecutorBm spawn fewer threads on ASAN
Branch: master
https://github.com/mongodb/mongo/commit/e00c3493a00a17abcfdaf2e3afaae10089adf0ca

Generated at Thu Feb 08 06:23:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.