[SERVER-14755] Tests that are failing due to replica set slowness Created: 30/Jul/14 Updated: 18/Aug/14 Resolved: 15/Aug/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 2.7.5 |
| Type: | Improvement | Priority: | Critical - P2 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
All the recent changes to replication code have slowed down some things in replication temporarily. Because of this there has been an increase in flaky test failures due to timing out waiting for various replication events (electing a primary is the most common). Rather than 1 ticket per failure like this, this ticket will be an umbrella ticket for all these types of failures. For each of these tests we should either remove them and replace them with unit tests, or verify that they pass consistently after we've switched fully over to the new replication code, which should speed things back up. |
| Comments |
| Comment by Githook User [ 01/Aug/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: Revert " This reverts commit 9ec7d68a97dc54f534e95959e62cafcab38bd440. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 01/Aug/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Eric Milkie [ 01/Aug/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have determined that the cause of the failures is due to mongod shutdown hanging. Since our testing harness shutdown eventually times out and kills the process without signaling a test failure, this type of problem is not immediately obvious from test logs. I'll be reverting the commit that broke this. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J Rassi [ 01/Aug/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
9ec7d68a didn't help. See the recent history (most recent first): Notice how ce38e09c (bottom of list) is clean on replsets/sharding, and 2ea07493 (top of list) is not clean. From that, I infer that one of the first four commits is bad. mattd@10gen.com, can you double-check my work and then assess which of the four is the most likely to be the bad commit, and then perform a revert? Note that 25-50 failures of the replsets suite came in overnight (generated from MCI's bisect attempt, which is still running), in addition to a number of failures across other test suites. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 31/Jul/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J Rassi [ 31/Jul/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bumping priority. The current volume of test noise on master from this isn't sustainable. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by J Rassi [ 31/Jul/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Here are test failures that have failed due to replication-related timeouts, from the past 24 hours. mongos_slaveok.js: repl1.js: https://mci.10gen.com/ui/task/mongodb_mongo_master_linux_64_debug_duroff_f7df4c7dcfabbadb4fde07018f85d00bc8ba47a0_14_07_30_21_17_09_replication_linux_64_debug_duroff replsetadd_profile.js: https://mci.10gen.com/ui/task/mongodb_mongo_master_osx_108_cxx11_debug_9e72a5b850f7bc31453828db522f4e0ecbfdb691_14_07_30_17_52_06_replicasets_osx_108_cxx11_debug replsets_priority1.js: https://mci.10gen.com/ui/task/mongodb_mongo_master_amazon_b494ade9a9461d16ab3812640ca72ee9c9e4345c_14_07_31_15_00_02_slow1_amazon rollback_too_new.js: https://mci.10gen.com/ui/task/mongodb_mongo_master_osx_108_cxx11_debug_791e2e00806449a01e7dec4d9b969392e183f393_14_07_30_17_38_51_replicasets_osx_108_cxx11_debug sync2.js: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 30/Jul/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
replsets/replset1.js
|