[SERVER-31562] dump replica set oplogs at the end of every failed test Created: 13/Oct/17 Updated: 30/Oct/23 Resolved: 14/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 3.4.16, 3.6.6, 3.7.3 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Jonathan Abrahams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v3.6, v3.4
|
||||||||||||||||||||
| Sprint: | TIG 2018-02-26 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Before shutting down nodes we can connect to each and dump their oplogs. Alternatively, we could wrap our tests in try...catches that dump the oplogs in the catch block. The latter would require getting ahold of the ShardingTest and ReplSetTest instances in an override which may not be possible. |
| Comments |
| Comment by Githook User [ 24/May/18 ] |
|
Author: {'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}Message: (cherry picked from commit 9fd34c78b7471a3cec40e7cdc221d10b1a100ad3) |
| Comment by Githook User [ 24/May/18 ] |
|
Author: {'username': 'hptabster', 'name': 'Jonathan Abrahams', 'email': 'jonathan@mongodb.com'}Message: (cherry picked from commit 9fd34c78b7471a3cec40e7cdc221d10b1a100ad3) |
| Comment by Githook User [ 14/Feb/18 ] |
|
Author: {'email': 'jonathan@mongodb.com', 'name': 'Jonathan Abrahams', 'username': 'hptabster'}Message: |
| Comment by Judah Schvimer [ 13/Feb/18 ] |
|
If resmoke would notice a primary crash without it, then I don't think we need it. |
| Comment by Max Hirschhorn [ 13/Feb/18 ] |
judah.schvimer, is the CheckPrimary hook as useful to add to all test suites given the changes that were made in |
| Comment by Jonathan Abrahams [ 12/Feb/18 ] |
We are planning to handle FSM (concurrency) suite failures in For tests which start/stop their own mongod cluster (not using a resmoke fixture), like rollback_fuzzer, the current mechanism to archive a failed test would be on any failure within the test. The tests which use resmoke to launch the fixture, like jstestfuzz_replication*, can specify which test or hook to archive, i.e. CheckPrimary hook. |
| Comment by Judah Schvimer [ 12/Feb/18 ] |
|
I am interested in the RollbackFuzzer, any other fuzzer tests that use replication (like generational_fuzzer_replication), fsm suites, and adding CheckPrimary to all suites that use it (since that means there was a crash that we might want to investigate. |
| Comment by Max Hirschhorn [ 12/Feb/18 ] |
Are some of the tests I know the Storage and Replication teams would benefit from archiving data files for. |
| Comment by Kevin Duong [ 12/Feb/18 ] |
|
jonathan.abrahams To follow up with storage and repl on this. |
| Comment by Max Hirschhorn [ 17/Oct/17 ] |
|
|
| Comment by Judah Schvimer [ 17/Oct/17 ] |
|
I think that would be sufficient, however if this were easier it could be useful while |
| Comment by Max Hirschhorn [ 17/Oct/17 ] |
|
judah.schvimer, would is be sufficient to upload the data files of the mongod processes to S3 on test failure? I'm wonder if collecting these diagnostics would be better handled by |