[SERVER-49276] repl_ssl_split_horizon.js test leaves /etc/os-release in a strange state Created: 02/Jul/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Unresolved Votes: 0
Labels: dev-prod-qp-idea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Server Tooling & Methods
Operating System: ALL
Participants:

 Description   

Driver tests occasionally fail with this strange error:

[2020/07/01 22:03:56.354] + . /data/mci/4ba521569a7cb43abf2e2b408ef3c33f/drivers-tools/.evergreen/download-mongodb.sh
[2020/07/01 22:03:56.355] ++ set -o xtrace
[2020/07/01 22:03:56.355] ++ set -o errexit
[2020/07/01 22:03:56.355] + get_distro
[2020/07/01 22:03:56.355] + '[' -f /etc/os-release ']'
[2020/07/01 22:03:56.355] + . /etc/os-release
[2020/07/01 22:03:56.356] ++ splithorizon1 localhost
[2020/07/01 22:03:56.356] /etc/os-release: line 1: splithorizon1: command not found
[2020/07/01 22:03:56.356] Enabling coredumps
[2020/07/01 22:03:56.356] Command failed: command encountered problem: error waiting on process '14483eb1-688d-48ab-843d-dd26cf81bce0': exit status 127
[2020/07/01 22:03:56.357] Task completed - FAILURE.

Ref: https://evergreen.mongodb.com/task/mongo_python_driver_tests_python_version_rhel62_test_ssl__platform~rhel62_auth~noauth_ssl~nossl_python_version~3.6_coverage~coverage_test_latest_sharded_cluster_1f4123e4bf54f9ed689ce77ffb8dfbccc3e688f0_20_07_01_21_49_46

The following test seems to be the culprit as it edits /etc/os-release and leaves it in a strange state: https://github.com/mongodb/mongo/blob/f31bc89/jstests/ssl/repl_ssl_split_horizon.js#L6

/etc/os-release is supposed to be a "newline-separated list of environment-like shell-compatible variable assignments". https://www.freedesktop.org/software/systemd/man/os-release.html#Description



 Comments   
Comment by Steven Vannelli [ 10/May/22 ]

Moving this ticket to the Backlog and removing the "Backlog" fixVersion as per our latest policy for using fixVersions.

Comment by Robert Guo (Inactive) [ 06/Aug/20 ]

Like Billy said, there's nothing in the Server repo modifies that file. But it's entirely possible somebody changed it in a rogue patch build. Given the 1:1000 failure rate, I think it's a plausible explanation. To understand the problem further, we'd need to audit all file access or have Evergreen limit the write permissions of files outside of /data/mci.

I'm putting it on the backlog for now since there's not much we can do in the short term. I'll propose the longer term solution at the next QP.

Comment by Billy Donahue [ 22/Jul/20 ]

Brian, I believe I've already done that audit, and nothing under the mongodb/mongo repo modifies that file.
There could be programs outside the server repo I wouldn't know about though, so it is good to get confirmation.

 

Comment by Brian Samek [ 22/Jul/20 ]

Thanks, I should have been clearer. I meant to ask whether the server or any tests manipulate that file. Evergreen does not. It's conceivable that there's something up with the host. I think the right teams to look would be STM and then Build.

Comment by Billy Donahue [ 22/Jul/20 ]

The server does not manipulate /etc/os-release.

There's a line in src/mongo/util/processinfo_linux.cpp's getLinuxDistro() that lists it as one of the files to READ to try to glean linux distro metadata.

The server wouldn't normally have permission to change that file.

Comment by Brian Samek [ 22/Jul/20 ]

I'm going to send to STM, since I'd like to know first if the server manipulates this file. CC robert.guo

Comment by Billy Donahue [ 15/Jul/20 ]

I'm guessing there's some packing and unpacking script used for setup and teardown that has a bug. The contents of a file that was created in a unit test under a /data/ dir has somehow ended up becoming contents of a buildhost's /etc/os-release file, in 1:1000 runs.
 
Assigning to Evergreen team for further analysis.

Comment by Billy Donahue [ 15/Jul/20 ]

I believe the description is incorrect, and the test does not appear to edit /etc/os-release. It only reads from it by shelling out to `cat`.
https://github.com/mongodb/mongo/blob/f31bc89/jstests/ssl/repl_ssl_split_horizon.js#L51

The highlighted line
https://github.com/mongodb/mongo/blob/f31bc89/jstests/ssl/repl_ssl_split_horizon.js#L6

generates the name of a mongodb split-horizon-hosts file, which is newline-separated hostnames.

Trying to figure out how that failure message from the description occurred.

Generated at Thu Feb 08 05:19:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.