[SERVER-10103] chaining.js timing out on OS X 10.5 64-bit, Windows 64-bit, Linux 64-bit Created: 05/Jul/13  Updated: 11/Jul/16  Resolved: 17/Jul/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.4.6, 2.5.1

Type: Bug Priority: Major - P3
Reporter: Matt Kangas Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS X 10.5 64-bit since #5547
Nightly OS X 10.5 64-bit since #1355
Windows 64-bit build 5594
Linux 64-bit Build 5580
Linux 64-bit Weekly Slow Tests
V2.4 Linux 32-bit Build #134


Operating System: ALL
Participants:

 Description   

Occurred on build #5547 and #5548

http://buildbot.mongodb.org/builders/OS%20X%2010.5%2064-bit/builds/5547/steps/test_9/logs/stdio
http://buildlogs.mongodb.org/OS%20X%2010.5%2064-bit/builds/5547/test/replica%20sets/chaining.js

ReplSetTest awaitReplication: checking secondaries against timestamp Timestamp(1372900459, 3)
ReplSetTest awaitReplication: checking secondary #1: bs-osx-106-x86-64-2.10gen.cc:31001
ReplSetTest awaitReplication: secondary #1, bs-osx-106-x86-64-2.10gen.cc:31001, is synced
ReplSetTest awaitReplication: checking secondary #2: bs-osx-106-x86-64-2.10gen.cc:31002
ReplSetTest awaitReplication: timestamp for secondary #2, bs-osx-106-x86-64-2.10gen.cc:31002, is Timestamp(1372900459, 2) but latest is Timestamp(1372900459, 3)
ReplSetTest awaitReplication: last oplog entry (of 4) for secondary #2, bs-osx-106-x86-64-2.10gen.cc:31002, is {  "ts" : Timestamp(1372900459, 2),  "h" : NumberLong("-4910588404677798978"),  "v" : 2,  "op" : "i",  "ns" : "chaining.foo",  "o" : {  "_id" : ObjectId("51d4cc6bd1e446f928d201a9"),  "a" : 1 } }
ReplSetTest awaitReplication: secondary #2, bs-osx-106-x86-64-2.10gen.cc:31002, is NOT synced
2013-07-03 21:14:51 EDT	
 m31000| Wed Jul  3 21:14:50.554 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31005: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31005
 m31000| Wed Jul  3 21:14:50.554 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31005: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31005
 m31000| Wed Jul  3 21:14:50.555 [rsHealthPoll] replset info bs-osx-106-x86-64-2.10gen.cc:31005 heartbeat failed, retrying
 m31000| Wed Jul  3 21:14:50.556 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31005: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31005
 m31000| Wed Jul  3 21:14:50.556 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31005: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31005
 m31000| Wed Jul  3 21:14:50.557 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31005: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31005
 m31002| Wed Jul  3 21:14:50.964 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31010: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31010
 m31002| Wed Jul  3 21:14:50.965 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31010: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31010
 m31002| Wed Jul  3 21:14:50.966 [rsHealthPoll] replset info bs-osx-106-x86-64-2.10gen.cc:31010 heartbeat failed, retrying
 m31002| Wed Jul  3 21:14:50.966 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31010: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31010
 m31002| Wed Jul  3 21:14:50.967 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31010: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31010
 m31002| Wed Jul  3 21:14:50.968 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31010: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31010
 m31002| Wed Jul  3 21:14:50.994 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31009: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31009
 m31002| Wed Jul  3 21:14:50.995 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31009: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31009
 m31002| Wed Jul  3 21:14:50.996 [rsHealthPoll] replset info bs-osx-106-x86-64-2.10gen.cc:31009 heartbeat failed, retrying
 m31002| Wed Jul  3 21:14:50.997 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31009: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31009
 m31002| Wed Jul  3 21:14:50.997 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31009: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31009
 m31002| Wed Jul  3 21:14:50.998 [rsHealthPoll] couldn't connect to bs-osx-106-x86-64-2.10gen.cc:31009: couldn't connect to server bs-osx-106-x86-64-2.10gen.cc:31009
Wed Jul  3 21:14:51.366 [awaiting replication] timed out after 30000ms ( 16 tries ) at src/mongo/shell/utils.js:492
failed to load: /data/buildslaves/OS_X_105_64bit/mongo/jstests/replsets/chaining.js

I cannot reproduce this on Linux 64-bit DEBUG. On the OS X builder it did not occur before build #5547 (#5546 was green).

Recommend trying to repro on another OS X box, and if successful, bisect between c11ea25c522c (good) and master (bad).



 Comments   
Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-10103 avoid hitting socket exception in the shell
Branch: v2.4
https://github.com/mongodb/mongo/commit/30300a9c025f718ac8ee4c5799e35e67410977a3

Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-10103 add debugging output
Branch: v2.4
https://github.com/mongodb/mongo/commit/9a09884807bd4a3ce1a12188a281a2885df235ef

Comment by Eric Milkie [ 17/Jul/13 ]

I am of the opinion that we should always backport changes that only modify .js test files – if possible.

Comment by Matt Kangas [ 17/Jul/13 ]

I think we need to backport to V2.4:

V2.4 Linux 32-bit Build #138 July 17

http://buildbot.mongodb.org/builders/V2.4%20Linux%2032-bit/builds/138/steps/test_6/logs/stdio
http://buildlogs.mongodb.org/V2.4%20Linux%2032-bit/builds/138/test/replica%20sets/chaining.js

Wed Jul 17 08:05:42.337 JavaScript execution failed: [awaiting replication] timed out after 30000ms ( 16 tries ) at src/mongo/shell/utils.js:L490
failed to load: /mnt/slaves/Linux_32bit_V2.4/mongo/jstests/replsets/chaining.js

Comment by Eric Milkie [ 15/Jul/13 ]

I think this has stopped failing now.

Comment by auto [ 12/Jul/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-10103 avoid hitting socket exception in the shell
Branch: master
https://github.com/mongodb/mongo/commit/0bd1b72a38bcbc598be5338b207c76a44c10810b

Comment by auto [ 12/Jul/13 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-10103 add debugging output
Branch: master
https://github.com/mongodb/mongo/commit/723a68c9afbe134c9dffe9312640ec22e941527e

Comment by Spencer Brody (Inactive) [ 11/Jul/13 ]

FWIW I don't think this failure is related to auth as it's failing in many places even without any auth turned on.

Comment by Matt Kangas [ 11/Jul/13 ]

Happening on V2.4 now? Grr.

V2.4 Linux 32-bit Build #134

http://buildbot.mongodb.org/builders/V2.4%20Linux%2032-bit/builds/134/steps/test_6/logs/stdio
http://buildlogs.mongodb.org/V2.4%20Linux%2032-bit/builds/134/test/replica%20sets/chaining.js

Thu Jul 11 12:01:07.631 JavaScript execution failed: [awaiting replication] timed out after 30000ms ( 16 tries ) at src/mongo/shell/utils.js:L490
failed to load: /mnt/slaves/Linux_32bit_V2.4/mongo/jstests/replsets/chaining.js

Comment by Eric Milkie [ 10/Jul/13 ]

The mongobridge is exiting for some reason. We're not printing any message describing the reason why (and most code paths do indeed log something). I may have to add more output in order to see which code path is being taken.

Comment by Matt Kangas [ 09/Jul/13 ]

spencer Do the +Error: 18

{ code: 18, ok: 0.0, errmsg: "auth fails" }

+ messages mentioned above look normal to you, or something new?

Comment by Matt Kangas [ 09/Jul/13 ]

Different failure now on Linux 32-bit #5804 (July 8)?

http://buildbot.mongodb.org/builders/Linux%2032-bit/builds/5804/steps/test_9/logs/stdio
http://buildlogs.mongodb.org/Linux%2032-bit/builds/5804/test/replica%20sets/chaining.js

ReplSetTest awaitReplication: checking secondary #2: bs-linux32.10gen.cc:31002
ReplSetTest awaitReplication: secondary #2, bs-linux32.10gen.cc:31002, is synced
ReplSetTest awaitReplication: finished: all 2 secondaries synced at timestamp Timestamp(1373331115, 1)
 m31006| Mon Jul  8 20:51:55.379 connection accepted from 10.255.119.66:51104 #6 (1 connection now open)
Mon Jul  8 20:51:55.380 DBClientCursor::init call() failed
 m31001| Mon Jul  8 20:51:55.379 [conn1] end connection 127.0.0.1:45339 (4 connections now open)
 m31000| Mon Jul  8 20:51:55.380 [initandlisten] connection accepted from 10.255.119.66:47187 #9 (5 connections now open)
Mon Jul  8 20:51:55.381 Error: error doing query: failed at src/mongo/shell/query.js:81
failed to load: /mnt/slaves/Linux_32bit/mongo/jstests/replsets/chaining.js

Comment by Matt Kangas [ 08/Jul/13 ]

Similar failure on Linux 64-bit Weekly Slow Tests

Sun Jul  7 11:36:26.973 [awaiting replication] timed out after 30000ms ( 16 tries ) at src/mongo/shell/utils.js:492

http://buildbot.mongodb.org/builders/Linux%2064-bit%20Weekly%20Slow%20Tests/builds/252/steps/test_9/logs/stdio
http://buildlogs.mongodb.org/Linux%2064-bit%20Weekly%20Slow%20Tests/builds/252/test/replica%20sets/chaining.js

Note, lots of these occurring earlier in the log:

Error: 18 { code: 18, ok: 0.0, errmsg: "auth fails" }

Comment by Matt Kangas [ 06/Jul/13 ]

Now also seen on Linux 64-bit builder!

http://buildbot.mongodb.org/builders/Linux%2064-bit/builds/5580

 *******************************************
         Test : chaining.js ...
      Command : /data/buildslaves/Linux_64bit/mongo/mongo --port 27999 --authenticationMechanism MONGODB-CR --nodb /data/buildslaves/Linux_64bit/mongo/jstests/replsets/chaining.js --eval TestData = new Object();TestData.testPath = "/data/buildslaves/Linux_64bit/mongo/jstests/replsets/chaining.js";TestData.testFile = "chaining.js";TestData.testName = "chaining";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;TestData.authMechanism = "MONGODB-CR";
         Date : Sat Jul  6 01:51:06 2013
                (output suppressed; see http://buildlogs.mongodb.org/build/51d76c6bd2a60f19290001a2/test/51d7780ad2a60f20ce0009dc/)
                    1.5141 minutes
test /data/buildslaves/Linux_64bit/mongo/jstests/replsets/chaining.js exited with status 253

http://buildlogs.mongodb.org/Linux%2064-bit/builds/5580/test/replica%20sets/chaining.js

Sat Jul  6 01:52:31.733 [awaiting replication] timed out after 30000ms ( 16 tries ) at src/mongo/shell/utils.js:492
failed to load: /data/buildslaves/Linux_64bit/mongo/jstests/replsets/chaining.js

Comment by Matt Kangas [ 05/Jul/13 ]

Also seen on Windows 64-bit build 5594

http://buildbot.mongodb.org/builders/Windows%2064-bit/builds/5594/steps/test_9/logs/stdio
http://buildlogs.mongodb.org/Windows%2064-bit/builds/5594/test/replica%20sets/chaining.js?mode=raw

Thu Jul 04 01:53:02.975 [awaiting replication] timed out after 30000ms ( 16 tries ) at src/mongo/shell/utils.js:492
failed to load: D:\slave\Windows_64bit\mongo\jstests\replsets\chaining.js

Comment by Matt Kangas [ 05/Jul/13 ]

Unable to repro on my OS X 10.8 laptop when invoking jstests/replsets/chaining.js directly.

Generated at Thu Feb 08 03:22:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.