[SERVER-69994] broken spawnhost setup for debugging C++ benchmark (bad globs?) Created: 26/Sep/22  Updated: 29/Oct/23  Resolved: 10/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Billy Donahue Assignee: Matt Kneiser
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
causes SERVER-75351 Missing test binaries after running s... Closed
causes SERVER-70707 Spawnhost Script unzip failure on Win... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2022-10-17
Participants:
Linked BF Score: 21

 Description   

I have a BF to debug.
https://jira.mongodb.org/browse/BF-26477

I have a core dump to load into gdb.
https://spruce.mongodb.com/task/mongodb_mongo_master_rhel80_debug_asan_benchmarks_orphaned_7645dbbe88354f09ee8d84dc393a899cd067bc71_22_09_23_02_34_41/files?execution=0&sortBy=STATUS&sortDir=ASC

The executable that failed is a C++ benchmark: service_executor_bm

I asked Evergreen for a spawnhost to investigate this.
Logging into the spawnhost, I don't get a useful environment.
My ssh session is greeted with a lot of noise about failed glob patterns:

Last login: Mon Sep 26 18:47:43 2022 from 207.251.78.34
 
Debuggable binaries:
ls: cannot access 'mongo*': No such file or directory
mongo core dumps:
ls: cannot access 'dump_mongo.*': No such file or directory
mongod core dumps:
ls: cannot access 'dump_mongod.*': No such file or directory
mongos core dumps:
ls: cannot access 'dump_mongos.*': No such file or directory
Core dumps from unknown processes (crashed processes typically found here):
-rw------- 1 ec2-user ec2-user 21211217920 Sep 23 08:08 dump_service_executo.112617.core
 
To examine a core dump, type 'gdb ./<binary> ./<core file>'

I can see that a few symbolic links failed to be created.

Looks like this was probably the outcome of some simple bash scripts that are not written to deal with unmatched globs. The * symlink is a dead symlink below, pointing to a bin/* file that doesn't exist

[ec2-user@ip-10-128-152-165 debug]$ ls -al
total 20714080
lrwxrwxrwx 1 ec2-user ec2-user           5 Sep 26 18:42 '*' -> 'bin/*'
drwxrwxr-x 3 ec2-user ec2-user         130 Sep 26 18:42  .
drwxr-xr-x 5 ec2-user ec2-user          41 Sep 26 18:40  ..
lrwxrwxrwx 1 ec2-user ec2-user          53 Sep 26 18:40  .gdbinit -> /data/mci/source-mongodb-mongo-master-7645db/.gdbinit
drwxrwxr-x 6 ec2-user ec2-user         152 Sep 26 18:40  Boost-Pretty-Printer
lrwxrwxrwx 1 ec2-user ec2-user          57 Sep 26 18:40  buildscripts -> /data/mci/source-mongodb-mongo-master-7645db/buildscripts
-rw------- 1 ec2-user ec2-user 21211217920 Sep 23 08:08  dump_service_executo.112617.core
lrwxrwxrwx 1 ec2-user ec2-user          48 Sep 26 18:40  src -> /data/mci/source-mongodb-mongo-master-7645db/src



 Comments   
Comment by Githook User [ 10/Oct/22 ]

Author:

{'name': 'Matt Kneiser', 'email': 'matt.kneiser@mongodb.com', 'username': 'themattman'}

Message: SERVER-69994 Make spawnhost script more robust
Branch: master
https://github.com/mongodb/mongo/commit/d3de5a0a834af943ec43e713f0a09e7bc2309ddb

Comment by Matt Kneiser [ 04/Oct/22 ]

I just realized that the messages in this ticket's description refer to the .profile script that is added as a welcome message. I will patch that up.

Comment by Max Hirschhorn [ 27/Sep/22 ]

I was able to recover the benchmark executables from the compile task and scp them into the spawnhost and get my gdb rolling, but it sure took a lot of figuring out.

Thanks billy.donahue@mongodb.com, I hadn't realized we were always uploading the C++ benchmark binaries to S3. It sounds like the setup_spawnhost_coredump script is missing handling for extracting the tarball.

BENCHMARK_ARCHIVE=`ls /data/mci/artifacts-*compile_upload_benchmarks/mongodb_mongo_*.tgz`
tar --wildcards --strip-components=0 -xzf $BENCHMARK_ARCHIVE 'bin/*' &
tar --wildcards -xzf $BENCHMARK_ARCHIVE 'lib/*' &

Comment by Billy Donahue [ 27/Sep/22 ]

The globbing problems can be handled with shopt -s nullopt or shopt -s failglob in the script, depending on whether failed expansion should be an error or just a null string.

The script's glob behavior is one thing.

The s3 uploads of benchmark executables (the thing that exposed the scripting bug) is another thing. SDP could probably route this or handle it best. Assigning to them.

Comment by Billy Donahue [ 27/Sep/22 ]

Well yes. I want a better debugging experience, but I'm also worried about the uncontrolled symlink creation when globs fail.
The chaotic symlinks make it difficult to see what the spawnhost was trying to set up for me or how to proceed with the debugging.

If a glob fails and the literal * is echoed by the glob, we're going to be echoing weird message to the screen and running dangerous commands with wildcard arguments. That feels like an accident waiting to happen, with behavior dependent on what other files are laying around to fill the wildcard.

I was able to recover the benchmark executables from the compile task and scp them into the spawnhost and get my gdb rolling, but it sure took a lot of figuring out.

Comment by Billy Donahue [ 26/Sep/22 ]

These tarballs in the data directory are empty.
It looks like we've captured a 20GB core dump but we haven't captured the executable that produced it.

[ec2-user@ip-10-128-152-165 ~]$ tar xvzf /data/mci/artifacts-7645db-rhel80-debug-asan_benchmarks_orphaned/mongo-data-mongodb_mongo_master_rhel80_debug_asan_benchmarks_orphaned_7645dbbe88354f09ee8d84dc393a899cd067bc71_22_09_23_02_34_41-build_install_bin_service_executor_bm-0-0.tgz
[ec2-user@ip-10-128-152-165 ~]$ tar xvzf /data/mci/artifacts-7645db-rhel80-debug-asan_benchmarks_orphaned/mongo-data-mongodb_mongo_master_rhel80_debug_asan_benchmarks_orphaned_7645dbbe88354f09ee8d84dc393a899cd067bc71_22_09_23_02_34_41-build_install_bin_service_executor_bm-0-1.tgz

(This command is showing that thes 61-byte tarballs are valid tarballs containing no files)

Generated at Thu Feb 08 06:14:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.