[SERVER-28535] hang_analyzer.py should attach to mongod processes if Jepsen test times out in Evergreen Created: 29/Mar/17  Updated: 06/Dec/17  Resolved: 03/May/17

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.4.5, 3.5.7

Type: Improvement Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-28461 Run Jepsen's "set" test in Evergreen Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4
Sprint: TIG 2017-05-08
Participants:
Linked BF Score: 0

 Description   

The jepsen_xx tasks set the ${hang_analyzer_processes} expansion to "java", causing hang_analyzer.py not to attach to any mongod processes. This makes it difficult to debug failures where running MongoDB's Jepsen tests induces a hang/deadlock in mongod itself.

- <<: *run_jepsen_template
  name: jepsen_findAndModify_WT
  commands:
  - func: "do setup"
  - func: "do jepsen setup"
  - func: "run jepsen test"
    vars:
      <<: *jepsen_config_vars
      hang_analyzer_processes: "java"
      jepsen_read_with_find_and_modify: --read-with-find-and-modify
      jepsen_storage_engine: --storage-engine wiredTiger



 Comments   
Comment by Githook User [ 26/May/17 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-28535 Don't set hang_analyzer_processes for jepsen* tasks.

Changes the hang_analyzer.py script to run with root privileges on the
ubuntu1404-jepsen distro in order to be able to attach to the mongod
processes inside the LXC containers.

(cherry picked from commit 1530cf54fd9db4e9e46e5fdd0b42972cd84b4c25)
Branch: v3.4
https://github.com/mongodb/mongo/commit/87cadca9e935facfe4ca878da4de919ecc5f4090

Comment by Githook User [ 03/May/17 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-28535 Don't set hang_analyzer_processes for jepsen* tasks.

Changes the hang_analyzer.py script to run with root privileges on the
ubuntu1404-jepsen distro in order to be able to attach to the mongod
processes inside the LXC containers.
Branch: master
https://github.com/mongodb/mongo/commit/1530cf54fd9db4e9e46e5fdd0b42972cd84b4c25

Comment by Max Hirschhorn [ 29/Mar/17 ]

GDB cannot attach to a process running in an LXC container.

jonathan.abrahams, sure it can. A process (e.g. a mongod) in a pid namespace (e.g. in an LXC container) is still visible to the root namespace (i.e. the host machine).

A process is visible to other processes in its PID namespace, and to the processes in each direct ancestor PID namespace going back to the root PID namespace. In this context, "visible" means that one process can be the target of operations by another process using system calls that specify a process ID.

http://man7.org/linux/man-pages/man7/pid_namespaces.7.html

$ ps -ef --forest
...
root      1264     1  0 13:20 ?        00:00:00 lxc-start -n n1 -d
root      1309  1264  0 13:20 ?        00:00:00  \_ /sbin/init
root      1947  1309  0 13:21 ?        00:00:00      \_ upstart-socket-bridge --daemon
root      2448  1309  0 13:21 ?        00:00:00      \_ upstart-udev-bridge --daemon
root      2569  1309  0 13:21 ?        00:00:00      \_ /lib/systemd/systemd-udevd --daemon
root      2675  1309  0 13:21 ?        00:00:00      \_ upstart-file-bridge --daemon
message+  2930  1309  0 13:21 ?        00:00:00      \_ rsyslogd
root      3141  1309  0 13:21 ?        00:00:00      \_ dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      3501  1309  0 13:21 pts/8    00:00:00      \_ /sbin/getty -8 38400 tty4
root      3531  1309  0 13:21 pts/6    00:00:00      \_ /sbin/getty -8 38400 tty2
root      3532  1309  0 13:21 pts/7    00:00:00      \_ /sbin/getty -8 38400 tty3
root      3568  1309  0 13:21 ?        00:00:00      \_ /usr/sbin/sshd -D
root      5185  3568  0 14:08 ?        00:00:00      |   \_ sshd: root@notty
root      3595  1309  0 13:21 ?        00:00:00      \_ cron
root      3619  1309  0 13:21 pts/9    00:00:00      \_ /sbin/getty -8 38400 console
root      3621  1309  0 13:21 pts/5    00:00:00      \_ /sbin/getty -8 38400 tty1
1001     22359  1309  4 14:10 ?        00:00:00      \_ /opt/mongodb/bin/mongod --config /opt/mongodb/mongod.conf

Comment by Jonathan Abrahams [ 29/Mar/17 ]

GDB cannot attach to a process running in an LXC container. Similarly the pkill cannot kill processes active in those containers.

Generated at Thu Feb 08 04:18:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.