[SERVER-10206] sharding/read_pref.js timed out on Nightly Linux 64-bit Subscription RHEL 6.2 Created: 15/Jul/13  Updated: 11/Jul/16  Resolved: 15/Nov/13

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.5.4

Type: Bug Priority: Major - P3
Reporter: Matt Kangas Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

buildbot-special Nightly Linux 64-bit Subscription RHEL 6.2 Build #504
buildbot-special Nightly Linux 64-bit Subscription Amazon AMI Build #579
buildbot-special Nightly Linux 64-bit Subscription Ubuntu 1204 Build #172


Operating System: ALL
Participants:

 Description   

Nightly Linux 64-bit Subscription RHEL 6.2 Build #504 July 14 (rev dbc297ef013)

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/504/steps/shell_3/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/504/test/sharding/read_pref.js

stdio ends with

 *******************************************
         Test : read_pref.js ...
      Command : /data/buildslaves/Linux_64bit_Subscription_RHEL_62_Nightly/mongo/mongo --port 27999 --authenticationMechanism CRAM-MD5 --nodb /data/buildslaves/Linux_64bit_Subscription_RHEL_62_Nightly/mongo/jstests/sharding/read_pref.js --eval TestData = new Object();TestData.testPath = "/data/buildslaves/Linux_64bit_Subscription_RHEL_62_Nightly/mongo/jstests/sharding/read_pref.js";TestData.testFile = "read_pref.js";TestData.testName = "read_pref";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = true;TestData.keyFile = "/data/buildslaves/Linux_64bit_Subscription_RHEL_62_Nightly/mongo/jstests/libs/authTestsKey";TestData.keyFileData = "Thiskeyisonlyforrunningthesuitewithauthenticationdontuseitinanytestsdirectly";TestData.authMechanism = "CRAM-MD5";
         Date : Sun Jul 14 11:47:32 2013
                (output suppressed; see http://buildlogs.mongodb.org/build/51e28b25d2a60f4291000883/test/51e2c815d2a60f12130004ac/)
 
command timed out: 10800 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=15499.007242



 Comments   
Comment by Randolph Tan [ 15/Nov/13 ]

milkie thinks that the last failure was caused by our build logger being too slow to respond. This caused the replica sets to hang up while waiting for the logs to be flushed, which resulted in missing the heartbeats.

Comment by Randolph Tan [ 15/Nov/13 ]

Test bailed out even before the setup finishes. It fails when it was about to add users inside ReplSetTest#initiate:

    // Setup authentication if running test with authentication
    if ((jsTestOptions().keyFile || jsTestOptions().useX509) && 
          cmdKey == 'replSetInitiate') {
        master = this.getMaster(); // get master conn here
        jsTest.addAuth(master);  // and use it here
        jsTest.authenticateNodes(this.nodes);
    }

So, it gets the master connection, but for some reason, the nodes suddenly thinks the others are down, causing the current primary to step down:

m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet info ip-10-141-164-99:31103 is down (or slow to respond): 
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet member ip-10-141-164-99:31103 is now in state DOWN
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet info ip-10-141-164-99:31102 is down (or slow to respond): 
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet member ip-10-141-164-99:31102 is now in state DOWN
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet info ip-10-141-164-99:31104 is down (or slow to respond): 
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet info ip-10-141-164-99:31101 is down (or slow to respond): 
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet member ip-10-141-164-99:31104 is now in state DOWN
m31100| 2013-11-14T23:11:13.776-0500 [rsHealthPoll] replSet member ip-10-141-164-99:31101 is now in state DOWN
m31100| 2013-11-14T23:11:13.776-0500 [rsMgr] replSet attempting to relinquish
m31100| 2013-11-14T23:11:13.776-0500 [rsMgr] replSet relinquishing primary state
m31100| 2013-11-14T23:11:13.777-0500 [rsMgr] replSet SECONDARY
m31100| 2013-11-14T23:11:13.777-0500 [rsMgr] replSet closing client sockets after relinquishing primary
m31100| 2013-11-14T23:11:13.777-0500 [rsMgr] replSet can't see a majority, will not try to elect self
m31100| 2013-11-14T23:11:13.777-0500 [rsMgr] replSet can't see a majority, will not try to elect self

As a consequence, causing the shell to lose connection and throw an exception when it tries to use the master connection to add user:

2013-11-14T23:13:48.349-0500 Error: error doing query: failed at src/mongo/shell/query.js:81

Comment by Daniel Pasette (Inactive) [ 15/Nov/13 ]

https://mci.10gen.com/ui/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_787023e2c42bf7e65b1286887243e9177384d0c4_13_11_15_01_06_06_sharding_auth_rhel62

Comment by Randolph Tan [ 04/Sep/13 ]

Test timed out because one the servers died unexpectedly:

	
 m31100| 2013-09-04T05:56:31.691-0400 [conn15] command: { replSetHeartbeat: "test-rs0", v: 2, pv: 1, checkEmpty: false, from: "AMAZONA-JD1A6QA:31103" }
 m31100| 2013-09-04T05:56:31.691-0400 [conn15] command admin.$cmd command: { replSetHeartbeat: "test-rs0", v: 2, pv: 1, checkEmpty: false, from: "AMAZONA-JD1A6QA:31103" } ntoreturn:1 keyUpdates:0  reslen:124 0ms
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\stacktrace.cpp(169)                   mongo::printStackTrace+0x43
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\log.cpp(131)                          mongo::logContext+0x9c
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\assert_util.cpp(167)                  mongo::msgasserted+0xfb
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\assert_util.cpp(158)                  mongo::msgasserted+0x13
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\net\sock.cpp(404)                     mongo::getLocalAddrForBoundSocketFd+0x8b
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\net\sock.cpp(412)                     mongo::Socket::Socket+0xa7
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\util\net\listen.cpp(484)                   mongo::Listener::initAndListen+0xa52
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\db\db.cpp(294)                             mongo::listen+0x11b
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\db\db.cpp(748)                             mongo::_initAndListen+0xd09
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\db\db.cpp(753)                             mongo::initAndListen+0x27
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\db\db.cpp(1423)                            mongoDbMain+0x1ef
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    ...\src\mongo\db\db.cpp(804)                             wmain+0x35
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crt0.c(278)  __tmainCRTStartup+0x13b
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] kernel32.dll                                                           BaseThreadInitThunk+0xd
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] 
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] exception in initAndListen: 16975 errno:10038 An operation was attempted on something that is not a socket., terminating
 m31100| dbexit: 2013-09-04T05:56:31.722-0400 [initandlisten] shutdown: going to close listening sockets...
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] closing listening socket: 624
 m31100| 2013-09-04T05:56:31.722-0400 [initandlisten] closing listening socket: 628

Comment by Matt Kangas [ 04/Sep/13 ]

Today this test failed on Windows 64-bit. It timed out internally, rather than hanging until buildbot killed it.

Builder Windows 64-bit Build #5680 Wed Sep 4 (rev 805c07c3bb)
http://buildlogs.mongodb.org/Windows%2064-bit/builds/5680/test/sharding/read_pref.js

 *******************************************
         Test : read_pref.js ...
      Command : D:\slave\Windows_64bit\mongo\mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --nodb D:\slave\Windows_64bit\mongo\jstests\sharding\read_pref.js --eval TestData = new Object();TestData.testPath = "D:\\slave\\Windows_64bit\\mongo\\jstests\\sharding\\read_pref.js";TestData.testFile = "read_pref.js";TestData.testName = "read_pref";TestData.setParameters = "";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;TestData.authMechanism = "MONGODB-CR";
         Date : Wed Sep  4 05:56:00 2013
                (output suppressed; see http://buildlogs.mongodb.org/build/5226c354d2a60f76cd000f81/test/522703b1d2a60f4d190002b0/)
                    1.6071 minutes
test D:\slave\Windows_64bit\mongo\jstests\sharding\read_pref.js exited with status -3

assert.soon failed: function () {
        var rsClientHosts = conn.getDB( "admin" ).runCommand( "connPoolStats" )[ "replicaSets" ]
        if( tests++ % 10 == 0 )
            printjson( rsClientHosts )
        for ( rsName in rsClientHosts ){
            if( rs && rs != rsName ) continue
            for ( var i = 0; i < rsClientHosts[rsName].hosts.length; i++ ){
                var clientHost = rsClientHosts[rsName].hosts[ i ];
                if( clientHost.addr != host ) continue
                // Check that *all* host properties are set correctly
                var propOk = true
                for( var prop in hostOk ){
                    if ( isObject( hostOk[prop] )) {
                        if ( !friendlyEqual( hostOk[prop], clientHost[prop] )){
                            propOk = false;
                            break;
                        }
                    }
                    else if ( clientHost[prop] != hostOk[prop] ){
                        propOk = false;
                        break;
                    }
                }
                if( propOk ) return true;
            }
        }
        return false;
    }, msg:timed out waiting for replica set client to recognize hosts

Comment by Matt Kangas [ 22/Aug/13 ]

Status: waiting for it to happen again after increasing the verbosity of the test

Comment by auto [ 24/Jul/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: Increase verbosity for SERVER-10206
Branch: master
https://github.com/mongodb/mongo/commit/7996179c7162acf54ff654d13178ef29e1bb1748

Comment by Matt Kangas [ 17/Jul/13 ]

Now on Ubuntu:

Nightly Linux 64-bit Subscription Ubuntu 1204 Build #172 July 17

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20Ubuntu%201204/builds/172/steps/shell_3/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20Ubuntu%201204/builds/172/test/sharding/read_pref.js

 *******************************************
         Test : read_pref.js ...
      Command : /data/buildslaves/Linux_64bit_Subscription_Ubuntu_1204_Nightly/mongo/mongo --port 27999 --authenticationMechanism CRAM-MD5 --nodb /data/buildslaves/Linux_64bit_Subscription_Ubuntu_1204_Nightly/mongo/jstests/sharding/read_pref.js --eval TestData = new Object();TestData.testPath = "/data/buildslaves/Linux_64bit_Subscription_Ubuntu_1204_Nightly/mongo/jstests/sharding/read_pref.js";TestData.testFile = "read_pref.js";TestData.testName = "read_pref";TestData.setParameters = "";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = true;TestData.keyFile = "/data/buildslaves/Linux_64bit_Subscription_Ubuntu_1204_Nightly/mongo/jstests/libs/authTestsKey";TestData.keyFileData = "Thiskeyisonlyforrunningthesuitewithauthenticationdontuseitinanytestsdirectly";TestData.authMechanism = "CRAM-MD5";
         Date : Wed Jul 17 01:19:21 2013
                (output suppressed; see http://buildlogs.mongodb.org/build/51e57946d2a60f7cde000548/test/51e5f11ad2a60f17c0000388/)
 
command timed out: 10800 seconds without output, attempting to kill

Comment by Matt Kangas [ 17/Jul/13 ]

renctan - ask Spencer, but I have seen this in the output from many different tests, and I believe it means the user credentials haven't been replicated to a secondary yet.

Comment by Randolph Tan [ 17/Jul/13 ]

Not sure if this is relevant but this lines in the logs looks suspicious (appears in both logs):

Authenticating to admin database as admin with mechanism CRAM-MD5 on connection: connection to bs-e-rhel62:31100
 m31100| Sun Jul 14 11:47:34.366 [conn1] note: no users configured in admin.system.users, allowing localhost access
 m31100| Sun Jul 14 11:47:34.367 [conn1] CRAM-MD5 authentication failed for admin on admin ; ProtocolError SASL(-1): generic failure: empty secret
Error: 18 Authentication failed.

Tried running the same test on the local machine with CRAM-MD5 and never saw this error. However, successful build in buildbot has this error...

Comment by Matt Kangas [ 15/Jul/13 ]

Same thing, same night on a different builder:

Nightly Linux 64-bit Subscription Amazon AMI Build #579 July 14 (rev dbc297ef0)

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20Amazon%20AMI/builds/579/steps/shell_3/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20Amazon%20AMI/builds/579/test/sharding/read_pref.js

stdio ends with

 *******************************************
         Test : read_pref.js ...
      Command : /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/mongo --port 27999 --authenticationMechanism CRAM-MD5 --nodb /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/sharding/read_pref.js --eval TestData = new Object();TestData.testPath = "/data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/sharding/read_pref.js";TestData.testFile = "read_pref.js";TestData.testName = "read_pref";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = true;TestData.keyFile = "/data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_Nightly/mongo/jstests/libs/authTestsKey";TestData.keyFileData = "Thiskeyisonlyforrunningthesuitewithauthenticationdontuseitinanytestsdirectly";TestData.authMechanism = "CRAM-MD5";
         Date : Sun Jul 14 17:30:20 2013
                (output suppressed; see http://buildlogs.mongodb.org/build/51e29941d2a60f4e71000b6e/test/51e2e02dd2a60f329d000314/)
 
command timed out: 10800 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=16823.979939

Generated at Thu Feb 08 03:22:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.