[SERVER-57564] Increase system timeout duration to >2 hours to allow fio to copy over all files Created: 25/Mar/21  Updated: 29/Oct/23  Resolved: 10/Jun/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.15, 4.4.7, 4.0.26, 5.0.0-rc2, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Vlad Rachev (Inactive) Assignee: Mikhail Shchatko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0, v4.4, v4.2, v4.0
Sprint: STM 2021-06-14
Participants:
Linked BF Score: 0
Story Points: 1

 Description   

We've noticed fio nearly 2 hours to run, which is causing system timeouts. Let's see if we can dial that down.

david.daly I'm leaving this in your backlog because you guys are probably better equipped to configure fio. Feel free to reassign back if you think it's something we should do.

 

We might not be able to change fio's runtime, since it needs to copy over the snapshot data. Let's instead look to increase the timeout to give it more time to finish this.

 



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 11/Jun/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-57564 Increase initialsync-logkeeper timeout

(cherry picked from commit c970224e10642c7155ffca8c698d40848481c54e)
Branch: v4.0
https://github.com/mongodb/mongo/commit/4f573c7ecbd0a5865b99c265928e97a75fefa4fd

Comment by Githook User [ 10/Jun/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-57564 Increase initialsync-logkeeper timeout

(cherry picked from commit c970224e10642c7155ffca8c698d40848481c54e)
Branch: v4.2
https://github.com/mongodb/mongo/commit/ed0ab75eef1faf0c29b0cbbf62b634cde8e3ec89

Comment by Githook User [ 10/Jun/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-57564 Increase initialsync-logkeeper timeout

(cherry picked from commit c970224e10642c7155ffca8c698d40848481c54e)
Branch: v4.4
https://github.com/mongodb/mongo/commit/60ab9d0e6e9b4d8dddeec16f14d8637e8802d1db

Comment by Githook User [ 09/Jun/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-57564 Increase initialsync-logkeeper timeout

(cherry picked from commit c970224e10642c7155ffca8c698d40848481c54e)
Branch: v5.0
https://github.com/mongodb/mongo/commit/7ee86a500942f5fce47471bbab6be646bd6ba8a1

Comment by Githook User [ 09/Jun/21 ]

Author:

{'name': 'Mikhail Shchatko', 'email': 'mikhail.shchatko@mongodb.com', 'username': 'MikhailShchatko'}

Message: SERVER-57564 Increase initialsync-logkeeper timeout
Branch: master
https://github.com/mongodb/mongo/commit/c970224e10642c7155ffca8c698d40848481c54e

Comment by Brooke Miller [ 06/Apr/21 ]

We discussed that we'll increase the exec_timeout_secs to be the same as the task timeout. We'll likely also need to increase the DSI timeout, too. 

Comment by Vlad Rachev (Inactive) [ 25/Mar/21 ]

Yes it's logkeeper specific.

Ok I see. In that case we should look into increasing some timeout. A successful logkeeper takes around 7-8 hours, so we should be able to make this not time out after 2 hours (though evergreen timeouts are hard to grok). I will change the ticket title to reflect this and reassign to STM to figure out the timeout.

Comment by David Daly [ 25/Mar/21 ]

vlad.rachev this is logkeeper specific, yes? We can take a look. We might not be able to speed it up. This is an important step when working with an AWS snapshot. It forces the system to copy over all the data to the local system, which is required for this test. 

Generated at Thu Feb 08 05:42:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.