[SERVER-22328] bench_test_crud_commands.js fails due to resource contention from other resmoke jobs and low timeout values Created: 03/Jan/16 Updated: 22/Feb/16 Resolved: 18/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | None |
| Fix Version/s: | 3.0.10, 3.3.2 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Ian Whalen (Inactive) | Assignee: | J Rassi |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Completed: | |||||
| Sprint: | Query F (02/01/16), Query 10 (02/22/16) | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
noPassthroughWithMongod_WT failed on suse12bench_test_crud_commands.js - Logs | History BF Ticket Generated by ian.whalen |
| Comments |
| Comment by Githook User [ 18/Feb/16 ] | ||||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: (cherry picked from commit cace2c61d3bddf3dd9f82ead6b0bb6167d635d11) | ||||||||||||||||||
| Comment by Githook User [ 18/Feb/16 ] | ||||||||||||||||||
|
Author: {u'username': u'jrassi', u'name': u'Jason Rassi', u'email': u'rassi@10gen.com'}Message: | ||||||||||||||||||
| Comment by Samantha Ritter (Inactive) [ 11/Feb/16 ] | ||||||||||||||||||
| Comment by J Rassi [ 27/Jan/16 ] | ||||||||||||||||||
|
In the above failure, an insert command of 100 documents with write concern {w: "majority"} took 4.4 seconds. Again, I suspect the cause for this slowness is disk contention from other resmoke jobs. For example, job #6 was taking ~300ms for each ~2MB single-document insert (external_sort_text_agg.js) at 2016-01-27T19:48:09, the same time that the 4-second insert batch occurred in this job. | ||||||||||||||||||
| Comment by Jonathan Reams [ 27/Jan/16 ] | ||||||||||||||||||
|
Looks like this happened again https://evergreen.mongodb.com/task/mongodb_mongo_v3.2_rhel70_noPassthroughWithMongod_WT_43482c1e7ad1fd8c307460682a76368c80bb355d_16_01_27_18_26_22 | ||||||||||||||||||
| Comment by J Rassi [ 08/Jan/16 ] | ||||||||||||||||||
|
In this build failure, an insert command of 100 documents with the write concern {w: 1, j: false} took 1.8 seconds, though the test expects the operation to complete in <1 second.
This particular test case was added in I believe that this test is also suffering from the issue that benchRun has a short "warm-up" period (see BenchRunWorker::shouldCollectStats()) where it doesn't count operations that are run in the final stats report. This can be evidenced from the fact that the server reports the insert operations as completed (see the assert.gt(coll.count(), 0) assertion in the test), but benchRun reports these operations as not completed. I audited the logs for the other three resmoke.py jobs running on the machine at the same time, to look for evidence of resource contention on the machine at this time. I found that no_balance_collection.js (running in job #3) made two 40MB writes to disk at around the same time (one of which took 569ms, the other took 717ms), which points to disk contention as a possible cause for the inserts taking too long:
I also looked at the last 1000 executions of this test, and found no other failures. I recommend taking no action until this test fails again, in case the failure is due to a transient hardware issue. After analysis of the next failure, I would recommend one of the following:
#4 seems like it would have the highest likelihood of fixing the problem with the least amount of work. | ||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 03/Jan/16 ] | ||||||||||||||||||
|