[SERVER-10119] buildbot-special: replsets/server8070.js failing on Nightly Linux 64-bit Subscription Created: 06/Jul/13  Updated: 11/Jul/16  Resolved: 16/Jul/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.5.1

Type: Bug Priority: Major - P3
Reporter: Matt Kangas Assignee: J Rassi
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

buildbot-special: Nightly Linux 64-bit Subscription Amazon AMI
buildbot-special: Nightly Linux 64-bit Subscription RHEL 6.2
buildbot-special: Nightly Linux 64-bit Subscription RHEL 5.7 #466
buildbot-special: Nightly Linux 64-bit Subscription Ubuntu 1204 Build #159


Operating System: ALL
Participants:

 Description   

Build #562

 *******************************************
         Test : server8070.js ...
      Command : /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/mongo --port 27999 --authenticationMechanism CRAM-MD5 --nodb /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js --eval TestData = new Object();TestData.testPath = "/data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js";TestData.testFile = "server8070.js";TestData.testName = "server8070";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = true;TestData.keyFile = "/data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/libs/authTestsKey";TestData.keyFileData = "Thiskeyisonlyforrunningthesuitewithauthenticationdontuseitinanytestsdirectly";TestData.authMechanism = "CRAM-MD5";
         Date : Sat Jul  6 01:07:55 2013
                (output suppressed; see http://buildlogs.mongodb.org/build/51d73e27d2a60f574d0000bb/test/51d76debd2a60f1562000853/)
                    3.2656 minutes
test /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js exited with status 253

http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20Amazon%20AMI/builds/562/test/replica%20sets/server8070.js

 m31002| Sat Jul  6 01:08:18.866 [initandlisten] connection accepted from 127.0.0.1:36879 #1 (1 connection now open)
[
	connection to bs-e-amzn64:31000,
	connection to bs-e-amzn64:31001,
	connection to bs-e-amzn64:31002
]
Authenticating to admin database as admin with mechanism CRAM-MD5 on connection: connection to bs-e-amzn64:31000
 m31000| Sat Jul  6 01:08:18.869 [conn1] note: no users configured in admin.system.users, allowing localhost access
 m31000| Sat Jul  6 01:08:18.870 [conn1] CRAM-MD5 authentication failed for admin on admin ; ProtocolError SASL(-1): generic failure: empty secret
Error: 18 Authentication failed.
Caught exception while authenticating connection: "[Authenticating connection: connection to bs-e-amzn64:31000] timed out after 5000ms ( 6 tries )"
 
(many more authentication failures)
...
 
assert.soon failed: function () {
                    var syncingTo = member3.adminCommand({replSetGetStatus:1}).syncingTo;
                    return syncingTo == getHostName()+":"+replSet.ports[1];
                }, msg:undefined
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.soon (src/mongo/shell/assert.js:174:60)
    at /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js:106:20
    at Function.assert.throws (src/mongo/shell/assert.js:202:14)
    at fixed (/data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js:104:12)
    at /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js:117:1
 --- pause 3's bgsync thread ---
Allow 3 to apply ops 25-75
 
...
 
 m31001| Sat Jul  6 01:10:59.591 [rsHealthPoll] replset info bs-e-amzn64:31000 heartbeat failed, retrying
 m31001| Sat Jul  6 01:10:59.595 [rsHealthPoll] couldn't connect to bs-e-amzn64:31000: couldn't connect to server bs-e-amzn64:31000
 m31001| Sat Jul  6 01:10:59.600 [rsMgr] replSet I don't see a primary and I can't elect myself
 
...
 
assert.soon failed: function () {
        var last3 = member3.getSisterDB("local").oplog.rs.find().sort({$natural:-1}).limit(1)
            .next();
        print("primary: " + tojson(last.ts) + " secondary: " + tojson(last3.ts));
        return ((last.ts.t === last3.ts.t) && (last.ts.i === last3.ts.i))
    }, msg:undefined
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.soon (src/mongo/shell/assert.js:174:60)
    at /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js:124:8
Sat Jul  6 01:11:09.272 assert.soon failed: function () {
        var last3 = member3.getSisterDB("local").oplog.rs.find().sort({$natural:-1}).limit(1)
            .next();
        print("primary: " + tojson(last.ts) + " secondary: " + tojson(last3.ts));
        return ((last.ts.t === last3.ts.t) && (last.ts.i === last3.ts.i))
    }, msg:undefined at src/mongo/shell/assert.js:7
failed to load: /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/replsets/server8070.js

Is this an intermittent issue? Or did this break due to recent authentication changes?



 Comments   
Comment by J Rassi [ 16/Jul/13 ]

The part of the test that is failing is not the SERVER-8070 check. Midway through this test, "member 3" of the test set is given 10 seconds to replicate 50 simple write ops with the rsSyncApplyStop failpoint enabled, then the primary is shut down, then the member is then given 30 seconds to apply them and catch up after disabling the failpoint. The test is failing because this member never catches up (so, either it's not replicating the ops, or it's not applying them).

I notice that when the primary is shut down, the member spams to the log "[SyncSourceFeedbackThread] repl: couldn't connect to server bs-e-rhel57:31000" >1000x/second; possible connection to SERVER-6071 (considering all builders linked above are subscription builders, possibly those connections attempts are starving the repl worker thread of some authentication-related resource)? I verified that none of the four builders mentioned in the above comments have failed on this test since Friday, when SERVER-6071 was reverted (and, ran this test in a loop with master on bs-e-amzn64, where it's failed before, to no avail). Resolving, assuming it's SERVER-6071matt.kangas@10gen.com, re-open please if another build fails this test (or you find one that's failed since Friday).

Comment by Matt Kangas [ 09/Jul/13 ]

Nightly Linux 64-bit Subscription Ubuntu 1204 Build #159

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20Ubuntu%201204/builds/159/steps/shell_2/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20Ubuntu%201204/builds/159/test/replica%20sets/server8070.js

primary: Timestamp(1373365067, 50) secondary: Timestamp(1373365067, 31)
assert.soon failed: function () {
        var last3 = member3.getSisterDB("local").oplog.rs.find().sort({$natural:-1}).limit(1)
            .next();
        print("primary: " + tojson(last.ts) + " secondary: " + tojson(last3.ts));
        return ((last.ts.t === last3.ts.t) && (last.ts.i === last3.ts.i))
    }, msg:undefined
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.soon (src/mongo/shell/assert.js:174:60)
    at /data/buildslaves/Linux_64bit_Subscription_Ubuntu_1204_Nightly/mongo/jstests/replsets/server8070.js:124:8
Tue Jul  9 10:18:58.630 assert.soon failed: function () {
        var last3 = member3.getSisterDB("local").oplog.rs.find().sort({$natural:-1}).limit(1)
            .next();
        print("primary: " + tojson(last.ts) + " secondary: " + tojson(last3.ts));
        return ((last.ts.t === last3.ts.t) && (last.ts.i === last3.ts.i))
    }, msg:undefined at src/mongo/shell/assert.js:7
failed to load: /data/buildslaves/Linux_64bit_Subscription_Ubuntu_1204_Nightly/mongo/jstests/replsets/server8070.js

Comment by Matt Kangas [ 09/Jul/13 ]

More evidence that it's intermittent:

Nightly Linux 64-bit Subscription RHEL 6.2 Build #487

Jul 09 07:19	6386fe5ccc21...	failure	#487	Failed shell_2
Jul 08 07:20	6386fe5ccc21...	success	#486	Build successful
Jul 07 07:20	6386fe5ccc21...	success	#485	Build successful
Jul 06 07:22	6386fe5ccc21...	failure	#484	Failed shell_2
Jul 05 07:18	6386fe5ccc21...	success	#483	Build successful

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/487/steps/shell_2/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/487/test/replica%20sets/server8070.js

primary: Timestamp(1373378876, 74) secondary: Timestamp(1373378876, 73)
assert.soon failed: function () {
        var last3 = member3.getSisterDB("local").oplog.rs.find().sort({$natural:-1}).limit(1)
            .next();
        print("primary: " + tojson(last.ts) + " secondary: " + tojson(last3.ts));
        return ((last.ts.t === last3.ts.t) && (last.ts.i === last3.ts.i))
    }, msg:undefined
Error: Printing Stack Trace
    at printStackTrace (src/mongo/shell/utils.js:37:15)
    at doassert (src/mongo/shell/assert.js:6:5)
    at Function.assert.soon (src/mongo/shell/assert.js:174:60)
    at /data/buildslaves/Linux_64bit_Subscription_RHEL_62_Nightly/mongo/jstests/replsets/server8070.js:124:8
Tue Jul  9 10:09:07.670 assert.soon failed: function () {
        var last3 = member3.getSisterDB("local").oplog.rs.find().sort({$natural:-1}).limit(1)
            .next();
        print("primary: " + tojson(last.ts) + " secondary: " + tojson(last3.ts));
        return ((last.ts.t === last3.ts.t) && (last.ts.i === last3.ts.i))
    }, msg:undefined at src/mongo/shell/assert.js:7
failed to load: /data/buildslaves/Linux_64bit_Subscription_RHEL_62_Nightly/mongo/jstests/replsets/server8070.js

Comment by Matt Kangas [ 09/Jul/13 ]

Spencer, is this an authentication issue?

Comment by Matt Kangas [ 08/Jul/13 ]

Also Nightly Linux 64-bit Subscription RHEL 5.7. Same error as above, next nightly build (#467) succeeded.

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/466/steps/shell_2/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/466/test/replica%20sets/server8070.js

Comment by Matt Kangas [ 08/Jul/13 ]

Also Nightly Linux 64-bit Subscription RHEL 6.2 Build #484

Same error as above. The next nightly build (#485) succeeded, so this failure appears to be intermittent.

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/484/steps/shell_2/logs/stdio

http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/484/test/replica%20sets/server8070.js

Generated at Thu Feb 08 03:22:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.