[SERVER-1862] server does not start in time in pair7.js test Created: 27/Sep/10  Updated: 19/May/14  Resolved: 14/Jun/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: Aaron Staple
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

<http://buildbot.mongodb.org/builders/Linux%2064-bit/builds/2017/steps/test_3/logs/stdio>

m31002| Mon Sep 27 01:35:21 MongoDB starting : pid=11078 port=31002 dbpath=/data/db/jstests_pair7test-right 64-bit
m31002|
m31002| ** NOTE: This is a development version (1.7.1-pre-) of MongoDB.
m31002| ** Not recommended for production.
m31002|
m31002| Mon Sep 27 01:35:21 db version v1.7.1-pre-, pdfile version 4.5
m31002| Mon Sep 27 01:35:21 git version: eb74c84814a1f0aceb7cd335e1296d419550ae23
m31002| Mon Sep 27 01:35:21 sys info: Linux domU-12-31-39-06-79-A1 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
assert.soon failed: function () {
try

{ m = new Mongo("127.0.0.1:" + port); return true; }

catch (e) {
}
return false;
}, msg:unable to connect to mongo program on port 31002
Error("Printing Stack Trace (lines are 0-based in spidermonkey)")@:0
()@shell/utils.js:20
("assert.soon failed: function () {\n try

{\n m = new Mongo(\"127.0.0.1:\" + port);\n return true;\n }

catch (e)

{\n }

\n return false;\n}, msg:unable to connect to mongo program on port 31002")@shell/utils.js:31
((function () {try

{m = new Mongo("127.0.0.1:" + port);return true;}

catch (e) {}return false;}),"unable to connect to mongo program on port 31002",60000)@shell/utils.js:117
("mongod","-port",31002,"dbpath","/data/db/jstests_pair7test-right","nohttpinterface","-bind_ip","127.0.0.1")@shell/servers.js:136
apply(null,[object Array])@:0
("-port",31002,"dbpath","/data/db/jstests_pair7test-right","nohttpinterface","-bind_ip","127.0.0.1")@shell/servers.js:102
apply(null,[object Object])@:0
("-port",31002,"dbpath","/data/db/jstests_pair7test-right","nohttpinterface","-bind_ip","127.0.0.1")@shell/servers.js:106
(15)@/mnt/home/buildbot/slave/Linux_64bit/mongo/jstests/repl/pair7.js:22
@/mnt/home/buildbot/slave/Linux_64bit/mongo/jstests/repl/pair7.js:83

Mon Sep 27 01:36:22 uncaught exception: assert.soon failed: function () {
try

{ m = new Mongo("127.0.0.1:" + port); return true; }

catch (e) {
}
return false;
}, msg:unable to connect to mongo program on port 31002
failed to load: /mnt/home/buildbot/slave/Linux_64bit/mongo/jstests/repl/pair7.js



 Comments   
Comment by Aaron Staple [ 14/Jun/11 ]

we phased out replica pairs

Comment by Eliot Horowitz (Inactive) [ 16/Jan/11 ]

pairs are deprecated, so probably not worth looking at

Comment by Aaron Staple [ 27/Sep/10 ]

more log from above:

m31002| Mon Sep 27 01:36:22 got kill or ctrl c or hup signal 15 (Terminated), will terminate after current cmd ends
m31002| Mon Sep 27 01:36:22 [interruptThread] now exiting
m31002| Mon Sep 27 01:36:22 dbexit:
m31002| Mon Sep 27 01:36:22 [interruptThread] shutdown: going to close listening sockets...
m31002| Mon Sep 27 01:36:22 [interruptThread] shutdown: going to flush oplog...
m31002| Mon Sep 27 01:36:22 [interruptThread] shutdown: going to close sockets...
m31002| Mon Sep 27 01:36:22 [interruptThread] shutdown: waiting for fs preallocator...
m31002| Mon Sep 27 01:36:22 [interruptThread] shutdown: closing all files...
m31002| Mon Sep 27 01:36:22 closeAllFiles() finished
m31002| Mon Sep 27 01:36:22 [interruptThread] shutdown: removing fs lock...
m31002| Mon Sep 27 01:36:25 dbexit: really exiting now

The shell was unable to connect to the spawned mongod instance. Since the log message "waiting for connections..." was not printed, the db apparently had not completed its initialization after 60 seconds.

From looking at passing runs of pair7 on the same machine, it appears that certain mongod instances take 10 seconds to initialize. I logged on to the machine to run some tests, and it appears that the delay is occurring when we fsync a new lock file, in particular when we have just cleared that lock file's dbpath.

There's a big difference between 10 sec and 60 sec, but potentially the fsync issue is related to the pair7 test failure.

Generated at Thu Feb 08 02:58:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.