[SERVER-26308] Decrease number of jobs for sharding-related suites on Windows DEBUG and PPC variants Created: 23/Sep/16 Updated: 05/Apr/17 Resolved: 28/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.2, 3.5.2 |
| Type: | Task | Priority: | Critical - P2 |
| Reporter: | Charlie Swanson | Assignee: | Daniel Pasette (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v3.4, v3.2
|
||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||
| Description |
|
If you look at the task history, you can see that some particular suites have a very low success rate historically. For example:
|
| Comments |
| Comment by Githook User [ 13/Jan/17 ] |
|
Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}Message: (cherry picked from commit 3f64fb082c4e2a3c5750a2f0bb8dfffbabe4d06e) |
| Comment by Githook User [ 28/Dec/16 ] |
|
Author: {u'username': u'monkey101', u'name': u'Dan Pasette', u'email': u'dan@mongodb.com'}Message: |
| Comment by Daniel Pasette (Inactive) [ 28/Dec/16 ] |
|
Following up on the original description... It seems the ARM and Windows tasks are still not passing reliably, but the PPC tasks are passing reliably. I'm going to halve the tasks used on both ARM and Windows. Ubuntu 1604 ARM tasks: Windows DEBUG tasks: Ubuntu 1604 PPC tasks are now passing: |
| Comment by Eric Milkie [ 29/Nov/16 ] |
|
For evidence, I present exhibit A: After 21:29:00, the test hangs (due to a logic error). Incredibly, after the test hangs, we still continue to see the replica set struggle to stay up. At one point, ftdc reports that it took 12 seconds to run the serverStatus command. All on an idle three node replica set. The culprit clearly is due to running too many jobs on one machine. Running 16 jobs on one Windows machine completely overwhelms the IO subsystem, and we'll continue to see many build failures due to that. |
| Comment by Ernie Hershey [ 27/Sep/16 ] |
|
acm - When I said RHEL was "working better," I just meant that tasks on RHEL are running much faster and failing much less. I haven't seen other evidence of different behavior between the two distros. |
| Comment by Ernie Hershey [ 27/Sep/16 ] |
|
dan@10gen.com - Brian made the change. It's in BUILD-2166. We also decommissioned old hosts using the windows-vs2015-large distro, so any new tasks will be on the bigger hosts, starting now. |
| Comment by Daniel Pasette (Inactive) [ 27/Sep/16 ] |
|
Great. I'll stand down. Please update when we can look. I'll do a patch build which decreases the jobs on the mmap sharding suite. |
| Comment by Ernie Hershey [ 27/Sep/16 ] |
|
We can increase the windows distro size today. It's easy. brian.mccarthy or I will do it. |
| Comment by Daniel Pasette (Inactive) [ 27/Sep/16 ] |
|
I'm going to do some exploring of the win debug issues. |
| Comment by Daniel Pasette (Inactive) [ 24/Sep/16 ] |
|
cc: ernie.hershey/ramon.fernandez A few questions:
|