[SERVER-29945] Upload mongod logs to S3 when a Jepsen task times out Created: 30/Jun/17  Updated: 30/Oct/23  Resolved: 13/Jul/17

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.4.6, 3.5.11

Type: Improvement Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Eddie Louie
Resolution: Fixed Votes: 0
Labels: tig-evgconfig, tig-jepsen
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-28221 Recoverable Rollback: Switch to runni... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4
Sprint: TIG 2017-07-31
Participants:
Linked BF Score: 0

 Description   

When a Jepsen task times out we run the hang analyzer but lose the mongod logs, which can make discovering the root cause of the timeout difficult/impossible if it isn't clear from the thread stacks.



 Comments   
Comment by Githook User [ 13/Jul/17 ]

Author:

{u'username': u'elouie99', u'name': u'Eddie Louie', u'email': u'eddie.louie@mongodb.com'}

Message: SERVER-29945 Fix check for when Jepsen task is running so mongod logs are uploaded to S3

(cherry picked from commit 943361fe17e0443d6b899ba10160fb1f68742f42)
Branch: v3.4
https://github.com/mongodb/mongo/commit/240472c8c30ecd79dd2638a69b29da1ba6124532

Comment by Githook User [ 13/Jul/17 ]

Author:

{u'username': u'elouie99', u'name': u'Eddie Louie', u'email': u'eddie.louie@mongodb.com'}

Message: SERVER-29945 Fix check for when Jepsen task is running so mongod logs are uploaded to S3
Branch: master
https://github.com/mongodb/mongo/commit/943361fe17e0443d6b899ba10160fb1f68742f42

Comment by Max Hirschhorn [ 12/Jul/17 ]

Re-opening this ticket again (sorry) because I realized that the changes from SERVER-28221 made it so that this check for a jepsen/ directory will never be satisfied since the 10gen/jepsen repository is now cloned into a jepsen-mongodb/ directory. I think we should change how we detect whether a Jepsen task is running by checking if the ${distro_id} expansion is "ubuntu1404-jepsen" as was done in the changes from SERVER-28535.

Comment by Max Hirschhorn [ 12/Jul/17 ]

Re-opening to close as "Works as designed" since no changes were made on this ticket.

Comment by Eddie Louie [ 12/Jul/17 ]

Fixed with EVG-1828

Comment by Eddie Louie [ 12/Jul/17 ]

I had a look at the system logs and had a talk with jonathan.abrahams and we think this was caused by changes made in Evergreen. The post task clearly shows that the mongod.log files for Jepsen are gathered before the src directory is cleaned up. So something else is removing these files after a time out. It turns out EVG-1715 may be the cause and was later fixed by EVG-1828. I'll link these tickets and close this as fixed.

Generated at Thu Feb 08 04:22:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.