[SERVER-26317] benchRun can terminate without sending/completing any operations Created: 26/Sep/16  Updated: 06/Dec/22  Resolved: 05/Nov/21

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Won't Fix Votes: 0
Labels: tig-benchrun
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Server Tooling & Methods
Sprint: Query 2016-10-10
Participants:
Linked BF Score: 17

 Description   

See my comment on BF-2950, it's possible for benchRun to not send any operations to the server.



 Comments   
Comment by Githook User [ 24/Jan/17 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-26317 Increase benchRun seconds for flaky tests
Branch: v3.4
https://github.com/mongodb/mongo/commit/e2c655019a46979e2db6609369839cc968c31d17

Comment by Githook User [ 24/Jan/17 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-26317 Increase benchRun seconds for flaky tests
Branch: master
https://github.com/mongodb/mongo/commit/f6785f6fe57da074f1ff9458d19710ec3ae9b596

Comment by Charlie Swanson [ 24/Jan/17 ]

After some discussion, we've determined that the original attempt to increase the timeout missed a line that reset the timeout back to 1 second. I'll put up a review to make sure the timeout remains at the intended value.

Comment by David Daly [ 23/Jan/17 ]

I think the key parts are:

We already have this synchronization point within the start method. Would it make sense to try to tie the timer to that same barrier? Would that fix the issue you are seeing charlie.swanson?

Comment by Charlie Swanson [ 25/Oct/16 ]

Moving this back to the Perf team.

To finally flesh out this idea/request a little bit more (as david.daly requested via slack but I rudely ignored, sorry!), it looks like benchRun isn't even sending a single request to the server within the allotted time.

My opinion is that benchRun should ideally ensure that the server receives at least one request. If that request doesn't complete in the time limit, then the throughput is rightly zero. The problem right now is that it seems benchRun starts the timer before even sending the first request, which can unfortunately mean the server doesn't even receive a request in the allotted time.

It might not be possible to ensure the server receives at least one request, but it seems reasonable to ensure benchRun sends at least one request.

Comment by Githook User [ 25/Oct/16 ]

Author:

{u'username': u'cswanson310', u'name': u'Charlie Swanson', u'email': u'charlie.swanson@mongodb.com'}

Message: SERVER-26317 Increase benchRun seconds for flaky tests

These tests have been failing sporadically because benchRun completes
without running any operations. This is a temporary workaround to
reduce the noise in the build.
Branch: master
https://github.com/mongodb/mongo/commit/1cfafa59e7ab3f02ea386743d0a2019889b2f314

Comment by Charlie Swanson [ 07/Oct/16 ]

Oh also, before I forget: after some discussion within the Query team, we decided not to add any debugging logging or verbose mode to benchRun, since we thought SERVER-10552 should probably be done first to make it easy to increase/decrease the verbosity of different shell components.

Comment by Charlie Swanson [ 07/Oct/16 ]

I've converted this ticket into the real reason all the BFs are blocked on this, and I'm moving it over to the perf team since I believe that is who owns benchRun. This is causing a lot of spurious failures recently (see all the linked BF's), so if you could prioritize this that'd be great.

I haven't had any great ideas on how to fix this, but it would seem reasonable to start the timer only after/right before sending the first operation to the server, or something like that.

Generated at Thu Feb 08 04:11:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.