[SERVER-25640] Have ReplSetTest run checkDBHashes() in stopSet() Created: 16/Aug/16  Updated: 10/Jun/19  Resolved: 21/Feb/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.6.5, 3.7.3

Type: Improvement Priority: Major - P3
Reporter: DO NOT USE - Backlog - Test Infrastructure Group (TIG) Assignee: Kevin Albertson
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-31226 ensure or enforce that ShardingTest.s... Closed
Related
related to SERVER-33533 Add MongoRunner.stopMongod() call to ... Closed
related to SERVER-36816 Avoid reloading the view catalog on t... Closed
related to SERVER-33427 improve detectability of test failing... Closed
is related to SERVER-21630 Expand resmoke's CheckReplDBHash supp... Closed
is related to SERVER-41625 RollbackTest fixture runs data consis... Closed
Backwards Compatibility: Minor Change
Backport Requested:
v3.6
Sprint: TIG 2016-09-19, TIG 2016-10-10, TIG 2016-10-31, TIG 2016-11-21, TIG 2018-1-15, TIG 2018-1-1, TIG 2018-1-29, TIG 2018-02-12
Participants:
Linked BF Score: 50

 Description   

The checkDBHashes() function is run on replica sets created by resmoke.py fixtures but not on the replica sets started by the ReplSetTest fixture. The checkDBHashes() hook should be run any time a replica set is stopped.



 Comments   
Comment by Githook User [ 26/Apr/18 ]

Author:

{'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz', 'name': 'Kyle Suarez'}

Message: SERVER-25640 inmem_config_str.js must stop mongod before exiting
Branch: v3.6
https://github.com/mongodb/mongo/commit/c5a8da16f58ef71a86a68e44852846c635fc53e4

Comment by Githook User [ 26/Apr/18 ]

Author:

{'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz', 'name': 'Kyle Suarez'}

Message: SERVER-25640 read_majority.js must stop mongod before exiting early
Branch: v3.6
https://github.com/mongodb/mongo/commit/2677b4fffd7e6fcbbcb3a7b18bb507897b34c7ac

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs', 'name': 'Kevin Albertson'}

Message: SERVER-25640 check if jstests leave unterminated processes

(cherry picked from commit 8821cdb5508f8a26c025b3e0124903e9ae64c479)
Branch: v3.6
https://github.com/mongodb/mongo/commit/b91c8f8c2891517272f8d3eee4e0fb4aa86330b1

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet', 'name': 'Max Hirschhorn'}

Message: SERVER-25640 Terminate processes in more JavaScript tests.
Branch: v3.6
https://github.com/mongodb/mongo/commit/89526ccb8940e5032e6fea766524d15cfdf4e382

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs', 'name': 'Kevin Albertson'}

Message: SERVER-25640 terminate processes in jstests

(cherry picked from commit 7f5f161abc0355b71823be7b768075547f30a0cb)
Branch: v3.6
https://github.com/mongodb/mongo/commit/eb6bc09624de25b367791c4b2fbe33fbae164789

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs', 'name': 'Kevin Albertson'}

Message: SERVER-25640 terminate running processes in jstests

(cherry picked from commit c556e377094792e7253a95cb5fedcd703a99bf2c)
Branch: v3.6
https://github.com/mongodb/mongo/commit/03f89bb1588cfa2592586e4fbabc6eb14edd6591

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs', 'name': 'Kevin Albertson'}

Message: SERVER-25640 check dbhashes in stopSet

(cherry picked from commit bfc6cfe5fa474a48e35c6430858ae7a26a802e7b)
Branch: v3.6
https://github.com/mongodb/mongo/commit/da6a8e6c7842e8f17acf89bfbf7a89384d5cc49d

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs', 'name': 'Kevin Albertson'}

Message: SERVER-25640 terminate running processes in jstests

(cherry picked from commit 608689a72c3a60853590b32f14326acf2d5e28f7)
Branch: v3.6
https://github.com/10gen/mongo-enterprise-modules/commit/b58701040d5d1f1a1934e2614720db43afed5da0

Comment by Githook User [ 25/Apr/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs', 'name': 'Kevin Albertson'}

Message: SERVER-25640 auth ldap test replsets before stop

(cherry picked from commit 46ac85cef193ddcc5e741b27fb4481f7ab7f6b20)
(cherry picked from commit 608689a72c3a60853590b32f14326acf2d5e28f7)
Branch: v3.6
https://github.com/10gen/mongo-enterprise-modules/commit/d71a515dd6042e418eec2b10815710118687db37

Comment by Esha Maharishi (Inactive) [ 21/Feb/18 ]

Done - filed SERVER-33427. Thanks for responding about this.

Comment by Max Hirschhorn [ 21/Feb/18 ]

Max Hirschhorn Kevin Albertson, I noticed that failing to shut down a ShardingTest/ReplSetTest doesn't cause the test to log a "failed to load" line or a javascript stack trace (which makes sense, since which line would you error on?).

As an outcome of SERVER-25777, the mongo shell could already exit with a non-zero return code without printing a "failed to load" message.

The line that is logged ("a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test") also isn't/can't be logged at LogSeverity::Error, since it's not logged by a server process (and which makes the log line contain " E ", which is another thing I typically look for when a test fails without "failed to load").

It took some confusion and additional scrolling through the logs for me to realize why my new test was reporting failure when it seemed like the test ran to completion successfully. Just a thought, in case there's something that can be done to make this failure easier to detect.

esha.maharishi, I think your confusion is understandable. The goal of the message was to make it more obvious to the user what the remediation ought to be. Since that message isn't being surfaced clearly enough, we should change the logic in the mongo shell so that it is.

I don't see a reason that the mongo shell must use cout for logging the "exiting with a failure due to unterminated processes" message, so we could replace it with a call to severe() instead (and prefix the log message with 'F'). Do you think that would be sufficient for your purposes? Would you mind filing a new SERVER ticket for this improvement request?

For example, even just moving the "a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test" just before/after the "Summary: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)" could help.

Those messages are logged by two different processes (the mongo shell with the former and resmoke.py with the latter) so that isn't really something we'd consider. A related feature in resmoke.py would be to have special handling around certain exit codes from known processes. This case in the mongo shell would be one, but a memory leak detected by ASan/LSan would be another.

Comment by Esha Maharishi (Inactive) [ 21/Feb/18 ]

For example, even just moving the "a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test" just before/after the "Summary: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)" could help.

Comment by Esha Maharishi (Inactive) [ 21/Feb/18 ]

max.hirschhorn kevin.albertson, I noticed that failing to shut down a ShardingTest/ReplSetTest doesn't cause the test to log a "failed to load" line or a javascript stack trace (which makes sense, since which line would you error on?).

The line that is logged ("a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test") also isn't/can't be logged at LogSeverity::Error, since it's not logged by a server process (and which makes the log line contain " E ", which is another thing I typically look for when a test fails without "failed to load").

It took some confusion and additional scrolling through the logs for me to realize why my new test was reporting failure when it seemed like the test ran to completion successfully. Just a thought, in case there's something that can be done to make this failure easier to detect.

Snippet from my logs:

[js_test:database_and_shard_versioning_all_commands] 2018-02-21T15:13:43.244-0500 exiting with a failure due to unterminated processes
[js_test:database_and_shard_versioning_all_commands] 2018-02-21T15:13:43.244-0500 a call to MongoRunner.stopMongod(), ReplSetTest#stopSet(), or ShardingTest#stop() may be missing from the test
[MongoDFixture:job0] 2018-02-21T15:13:43.259-0500 I NETWORK  [conn4] end connection 127.0.0.1:47200 (0 connections now open)
[executor:js_test:job0] 2018-02-21T15:13:43.282-0500 database_and_shard_versioning_all_commands.js ran in 35.59 seconds.
[executor:js_test:job0] 2018-02-21T15:13:43.282-0500 JSTest jstests/sharding/database_and_shard_versioning_all_commands.js failed, so stopping...
[executor:js_test:job0] 2018-02-21T15:13:43.282-0500 Received a StopExecution exception: JSTest jstests/sharding/database_and_shard_versioning_all_commands.js failed.
[MongoDFixture:job0] Stopping mongod on port 15500 with pid 30885...
[MongoDFixture:job0] 2018-02-21T15:13:43.283-0500 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
[MongoDFixture:job0] 2018-02-21T15:13:43.283-0500 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
[MongoDFixture:job0] 2018-02-21T15:13:43.283-0500 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-15500.sock
[MongoDFixture:job0] 2018-02-21T15:13:43.284-0500 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
[MongoDFixture:job0] 2018-02-21T15:13:43.290-0500 I STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
[MongoDFixture:job0] 2018-02-21T15:13:43.554-0500 I STORAGE  [signalProcessingThread] shutdown: removing fs lock...
[MongoDFixture:job0] 2018-02-21T15:13:43.555-0500 I CONTROL  [signalProcessingThread] now exiting
[MongoDFixture:job0] 2018-02-21T15:13:43.555-0500 I CONTROL  [signalProcessingThread] shutting down with code:0
[MongoDFixture:job0] Successfully terminated the mongod on port 15500, exited with code 0.
[executor] 2018-02-21T15:13:43.560-0500 Summary: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)
    The following tests failed (with exit code):
        jstests/sharding/database_and_shard_versioning_all_commands.js (250)
[resmoke] 2018-02-21T15:13:43.560-0500 ================================================================================
[resmoke] 2018-02-21T15:13:43.560-0500 Summary of with_server suite: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)
The following tests failed (with exit code):
    jstests/sharding/database_and_shard_versioning_all_commands.js (250)
1 test(s) ran in 37.70 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)
    js_tests: 1 test(s) ran in 35.86 seconds (0 succeeded, 0 were skipped, 1 failed, 0 errored)
        The following tests failed (with exit code):
            jstests/sharding/database_and_shard_versioning_all_commands.js (250)

Comment by Githook User [ 20/Feb/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'name': 'Kevin Albertson', 'username': 'kevinAlbs'}

Message: SERVER-25640 check if jstests leave unterminated processes
Branch: master
https://github.com/mongodb/mongo/commit/8821cdb5508f8a26c025b3e0124903e9ae64c479

Comment by Githook User [ 20/Feb/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'name': 'Kevin Albertson', 'username': 'kevinAlbs'}

Message: SERVER-25640 terminate processes in jstests
Branch: master
https://github.com/mongodb/mongo/commit/7f5f161abc0355b71823be7b768075547f30a0cb

Comment by Githook User [ 08/Feb/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'name': 'Kevin Albertson', 'username': 'kevinAlbs'}

Message: SERVER-25640 terminate running processes in jstests
Branch: master
https://github.com/mongodb/mongo/commit/c556e377094792e7253a95cb5fedcd703a99bf2c

Comment by Githook User [ 06/Feb/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'name': 'Kevin Albertson', 'username': 'kevinAlbs'}

Message: SERVER-25640 terminate running processes in jstests
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/608689a72c3a60853590b32f14326acf2d5e28f7

Comment by Githook User [ 22/Jan/18 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@10gen.com', 'username': 'kevinAlbs'}

Message: SERVER-25640 check dbhashes in stopSet
Branch: master
https://github.com/mongodb/mongo/commit/bfc6cfe5fa474a48e35c6430858ae7a26a802e7b

Comment by Githook User [ 10/Jan/18 ]

Author:

{'email': 'kevin.albertson@10gen.com', 'name': 'Kevin Albertson', 'username': 'kevinAlbs'}

Message: SERVER-25640 auth ldap test replsets before stop
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/46ac85cef193ddcc5e741b27fb4481f7ab7f6b20

Comment by Judah Schvimer [ 12/Dec/17 ]

ShardingTest calls stopSet so it should just happen automatically. In addition to adding a check to stopSet we should also add calls to stopSet in many of the tests that do not explicitly call it.

Comment by Kevin Albertson [ 12/Dec/17 ]

After SERVER-21630 we can add this to ShardingTest as well.

Comment by Scott Hernandez (Inactive) [ 16/Aug/16 ]

Also, this no longer a hook, but simply part of the replSetTest class.

Comment by Eric Milkie [ 16/Aug/16 ]

There's probably a bunch of places where we don't want to check the hashes at shutdown time, because inconsistencies are expected. This could happen, for example, with a test with a partitioned set using mongobridge that never gets healed before shutdown. Therefore, there should be a way to disable the hook if necessary.

Generated at Thu Feb 08 04:09:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.