[SERVER-507] pair test doesn't get expected master-slave steady state Created: 22/Dec/09  Updated: 12/Jul/16  Resolved: 18/Feb/10

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 1.3.3

Type: Bug Priority: Minor - P4
Reporter: Aaron Staple Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

<http://buildbot.mongodb.org/builders/OS%20X%2010.5%2032-bit/builds/428/steps/test_2/logs/stdio>

assert: assert.soon failed: function () {
return rp.checkSteadyState(state, expectedMasterHost, twoMasterOk, leftValues, rightValues, debug);
}, msg:rp (left: 53096,/data/db/jstests_pair1test-left,127.0.0.1:53097,127.0.0.1:53095 (connected)(slave) right: 53097,/data/db/jstests_pair1test-right,127.0.0.1:53096,127.0.0.1:53095 ) failed to reach expected steady state (1,)
Tue Dec 22 15:26:03 JS Error: uncaught exception: assert.soon failed: function () {
return rp.checkSteadyState(state, expectedMasterHost, twoMasterOk, leftValues, rightValues, debug);
}, msg:rp (left: 53096,/data/db/jstests_pair1test-left,127.0.0.1:53097,127.0.0.1:53095 (connected)(slave) right: 53097,/data/db/jstests_pair1test-right,127.0.0.1:53096,127.0.0.1:53095 ) failed to reach expected steady state (1,)
failed to load: /Users/mike/buildslaves/mongo/OS_X_105_32bit/mongo/jstests/repl/pair1.js



 Comments   
Comment by Aaron Staple [ 18/Feb/10 ]

I think I fixed at least one of the problems. If simialar issues arise later, we can open a new bug.

Comment by auto [ 18/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-507 add sleep while clone finishes
http://github.com/mongodb/mongo/commit/78930bdb28fb6fcf2a7e55d85b4432ad79e01d2d

Comment by Aaron Staple [ 18/Feb/10 ]

Just to clarify, there aren't any indexes that need to be cloned, but the master is killed before the slave does the query to figure out that there aren't any indexes.

Comment by Aaron Staple [ 18/Feb/10 ]

Same failure in the occurrence 08/Feb/10 09:53 PM and quite possibly the original failure in the description.

Comment by Aaron Staple [ 18/Feb/10 ]

In the failure immediately above, a count on the slave indicated that all data had been copied over. However, the slave wasn't done cloning the indexes when the master went down. This caused the slave to time out for 60 secs instead of the usual shorter amount, which is longer than the test waits for the slave to become master.

Comment by Aaron Staple [ 17/Feb/10 ]

<http://buildbot.mongodb.org/builders/OS%20X%2010.5%2032-bit/builds/223/steps/test_2/logs/stdio>

Comment by Aaron Staple [ 08/Feb/10 ]

Sorry, in the previous comment I was referring to this run: <http://buildbot.mongodb.org/builders/Linux%2064-bit%20debug/builds/97/steps/test_2/logs/stdio>

Comment by Aaron Staple [ 08/Feb/10 ]

In the run from comment 2 above, the slave isn't able to connect to the mongobridge to its peer for some reason.

Comment by Aaron Staple [ 08/Feb/10 ]

loading file: /home/aaron/src/mongo/jstests/repl/pair1.js
shell: started mongo program /home/aaron/src/mongo/mongod --port 35000 --dbpath /data/db/jstests_pair1test-arbiter --nohttpinterface --noprealloc --smallfiles --bind_ip 127.0.0.1
m35000| Mon Feb 8 18:39:20 Mongo DB : starting : pid = 19354 port = 35000 dbpath = /data/db/jstests_pair1test-arbiter master = 0 slave = 0 64-bit
m35000| Mon Feb 8 18:39:20 db version v1.3.2-, pdfile version 4.5
m35000| Mon Feb 8 18:39:20 git version: b4fe153800f7cc65825607af5cd4e4216aff8808
m35000| Mon Feb 8 18:39:20 sys info: Linux aaron-ubuntu 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64 BOOST_LIB_VERSION=1_41
m35000| Mon Feb 8 18:39:20 waiting for connections on port 35000
m35000| Mon Feb 8 18:39:20 connection accepted from 127.0.0.1:59805 #1
shell: started mongo program /home/aaron/src/mongo/mongod --port 35001 --dbpath /data/db/jstests_pair1test-left --pairwith 127.0.0.1:35002 --arbiter 127.0.0.1:35000 --oplogSize 1 --nohttpinterface --noprealloc --smallfiles --bind_ip 127.0.0.1
m35001| Mon Feb 8 18:39:21 Mongo DB : starting : pid = 19363 port = 35001 dbpath = /data/db/jstests_pair1test-left master = 0 slave = 0 64-bit
m35001| Mon Feb 8 18:39:21 db version v1.3.2-, pdfile version 4.5
m35001| Mon Feb 8 18:39:21 git version: b4fe153800f7cc65825607af5cd4e4216aff8808
m35001| Mon Feb 8 18:39:21 sys info: Linux aaron-ubuntu 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64 BOOST_LIB_VERSION=1_41
m35001| Mon Feb 8 18:39:21 waiting for connections on port 35001
m35001| Mon Feb 8 18:39:21 ******
m35001| Mon Feb 8 18:39:21 creating replication oplog of size: 1MB (use --oplogSize to change)
m35001| Mon Feb 8 18:39:21 ******
m35001| Mon Feb 8 18:39:21 allocating new datafile /data/db/jstests_pair1test-left/local.ns, filling with zeroes...
m35001| Mon Feb 8 18:39:21 done allocating datafile /data/db/jstests_pair1test-left/local.ns, size: 16MB, took 0.016 secs
m35001| Mon Feb 8 18:39:21 allocating new datafile /data/db/jstests_pair1test-left/local.0, filling with zeroes...
m35001| Mon Feb 8 18:39:21 done allocating datafile /data/db/jstests_pair1test-left/local.0, size: 16MB, took 0.035 secs
m35001| Mon Feb 8 18:39:21 connection accepted from 127.0.0.1:33067 #1
shell: started mongo program /home/aaron/src/mongo/mongod --port 35002 --dbpath /data/db/jstests_pair1test-right --pairwith 127.0.0.1:35001 --arbiter 127.0.0.1:35000 --oplogSize 1 --nohttpinterface --noprealloc --smallfiles --bind_ip 127.0.0.1
m35002| Mon Feb 8 18:39:21 Mongo DB : starting : pid = 19373 port = 35002 dbpath = /data/db/jstests_pair1test-right master = 0 slave = 0 64-bit
m35002| Mon Feb 8 18:39:21 db version v1.3.2-, pdfile version 4.5
m35002| Mon Feb 8 18:39:21 git version: b4fe153800f7cc65825607af5cd4e4216aff8808
m35002| Mon Feb 8 18:39:21 sys info: Linux aaron-ubuntu 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64 BOOST_LIB_VERSION=1_41
m35002| Mon Feb 8 18:39:21 waiting for connections on port 35002
m35002| Mon Feb 8 18:39:21 ******
m35002| Mon Feb 8 18:39:21 creating replication oplog of size: 1MB (use --oplogSize to change)
m35002| Mon Feb 8 18:39:21 ******
m35002| Mon Feb 8 18:39:21 allocating new datafile /data/db/jstests_pair1test-right/local.ns, filling with zeroes...
m35002| Mon Feb 8 18:39:21 done allocating datafile /data/db/jstests_pair1test-right/local.ns, size: 16MB, took 0.136 secs
m35002| Mon Feb 8 18:39:21 allocating new datafile /data/db/jstests_pair1test-right/local.0, filling with zeroes...
m35002| Mon Feb 8 18:39:21 done allocating datafile /data/db/jstests_pair1test-right/local.0, size: 16MB, took 0.064 secs
m35002| Mon Feb 8 18:39:21 connection accepted from 127.0.0.1:57853 #1
m35001| Mon Feb 8 18:39:22 building new index on

{ _id: 1 }

for local.sources...
m35001| Mon Feb 8 18:39:22 Buildindex local.sources idxNo:0 { name: "id", ns: "local.sources", key:

{ _id: 1 }

}
m35001| Mon Feb 8 18:39:22 done for 0 records 0.002secs
m35001| Mon Feb 8 18:39:22 repl: main@127.0.0.1:35002
m35002| Mon Feb 8 18:39:22 connection accepted from 127.0.0.1:57854 #2
m35001| Mon Feb 8 18:39:22 repl: applied 1 operations
m35001| Mon Feb 8 18:39:22 repl: end sync_pullOpLog syncedTo: Mon Feb 8 18:39:21 2010 4b70cad9:1
m35001| Mon Feb 8 18:39:22 building new index on

{ _id: 1 }

for local.pair.sync...
m35001| Mon Feb 8 18:39:22 Buildindex local.pair.sync idxNo:0 { name: "id", ns: "local.pair.sync", key:

{ _id: 1 }

}
m35001| Mon Feb 8 18:39:22 done for 0 records 0secs
m35001| Mon Feb 8 18:39:22 repl: sleep 1sec before next pass
m35002| Mon Feb 8 18:39:22 building new index on

{ _id: 1 }

for local.sources...
m35002| Mon Feb 8 18:39:22 Buildindex local.sources idxNo:0 { name: "id", ns: "local.sources", key:

{ _id: 1 }

}
m35002| Mon Feb 8 18:39:22 done for 0 records 0.002secs
m35002| Mon Feb 8 18:39:22 repl: main@127.0.0.1:35001
m35001| Mon Feb 8 18:39:22 connection accepted from 127.0.0.1:33073 #2
m35002| Mon Feb 8 18:39:22 repl: applied 1 operations
m35002| Mon Feb 8 18:39:22 repl: end sync_pullOpLog syncedTo: Mon Feb 8 18:39:21 2010 4b70cad9:1
m35002| Mon Feb 8 18:39:22 building new index on

{ _id: 1 }

for local.pair.sync...
m35002| Mon Feb 8 18:39:22 Buildindex local.pair.sync idxNo:0 { name: "id", ns: "local.pair.sync", key:

{ _id: 1 }

}
m35002| Mon Feb 8 18:39:22 done for 0 records 0secs
m35002| Mon Feb 8 18:39:22 repl: sleep 1sec before next pass
m35001| Mon Feb 8 18:39:23 repl: main@127.0.0.1:35002
m35001| Mon Feb 8 18:39:23 pair: setting master=0 was -1
m35001| Mon Feb 8 18:39:23 repl: sleep 2sec before next pass
m35001| Mon Feb 8 18:39:23 User Exception 10058:not master
m35001| Mon Feb 8 18:39:23 Caught Assertion in insert , continuing
m35001| Mon Feb 8 18:39:23 insert exception userassert:not master 0ms
m35001| Mon Feb 8 18:39:23 User Exception 10054:not master
m35001| Mon Feb 8 18:39:23 Caught Assertion in update , continuing
m35001| Mon Feb 8 18:39:23 update exception userassert:not master 0ms
m35001| Mon Feb 8 18:39:23 User Exception 10056:not master
m35001| Mon Feb 8 18:39:23 Caught Assertion in remove , continuing
m35001| Mon Feb 8 18:39:23 remove exception userassert:not master 0ms
m35002| Mon Feb 8 18:39:23 allocating new datafile /data/db/jstests_pair1test-right/jstests_pair1test.ns, filling with zeroes...
m35002| Mon Feb 8 18:39:23 done allocating datafile /data/db/jstests_pair1test-right/jstests_pair1test.ns, size: 16MB, took 0.025 secs
m35002| Mon Feb 8 18:39:23 allocating new datafile /data/db/jstests_pair1test-right/jstests_pair1test.0, filling with zeroes...
m35002| Mon Feb 8 18:39:23 done allocating datafile /data/db/jstests_pair1test-right/jstests_pair1test.0, size: 16MB, took 0.2 secs
m35002| Mon Feb 8 18:39:23 building new index on

{ _id: 1 }

for jstests_pair1test.z...
m35002| Mon Feb 8 18:39:23 Buildindex jstests_pair1test.z idxNo:0 { name: "id", ns: "jstests_pair1test.z", key:

{ _id: 1 }

}
m35002| Mon Feb 8 18:39:23 done for 0 records 0secs
m35002| Mon Feb 8 18:39:23 update jstests_pair1test.z query:

{ _id: ObjId(4b70cadb5f38ce32d4ee643a) }

307ms
m35002| Mon Feb 8 18:39:23 repl: main@127.0.0.1:35001
m35002| Mon Feb 8 18:39:23 building new index on

{ ns: 1, id: 1 }

for local.temp.replIds...
m35002| Mon Feb 8 18:39:23 Buildindex local.temp.replIds idxNo:0 { name: "setIdx", ns: "local.temp.replIds", key:

{ ns: 1, id: 1 }

, unique: true }
m35002| Mon Feb 8 18:39:23 done for 0 records 0secs
m35002| Mon Feb 8 18:39:23 building new index on

{ ns: 1, id: 1 }

for local.temp.replModIds...
m35002| Mon Feb 8 18:39:23 Buildindex local.temp.replModIds idxNo:0 { name: "setIdx", ns: "local.temp.replModIds", key:

{ ns: 1, id: 1 }

, unique: true }
m35002| Mon Feb 8 18:39:23 done for 0 records 0secs
m35002| Mon Feb 8 18:39:23 repl: sleep 2sec before next pass
m35001| Mon Feb 8 18:39:25 repl: main@127.0.0.1:35002
m35001| Mon Feb 8 18:39:25 resync: dropping database jstests_pair1test
m35001| Mon Feb 8 18:39:25 resync: cloning database jstests_pair1test
m35002| Mon Feb 8 18:39:25 connection accepted from 127.0.0.1:57856 #3
m35001| Mon Feb 8 18:39:25 allocating new datafile /data/db/jstests_pair1test-left/jstests_pair1test.ns, filling with zeroes...
m35001| Mon Feb 8 18:39:25 done allocating datafile /data/db/jstests_pair1test-left/jstests_pair1test.ns, size: 16MB, took 0.282 secs
m35001| Mon Feb 8 18:39:25 allocating new datafile /data/db/jstests_pair1test-left/jstests_pair1test.0, filling with zeroes...
m35002| Mon Feb 8 18:39:25 repl: main@127.0.0.1:35001
m35001| Mon Feb 8 18:39:25 done allocating datafile /data/db/jstests_pair1test-left/jstests_pair1test.0, size: 16MB, took 0.171 secs
m35001| Mon Feb 8 18:39:25 building new index on

{ _id: 1 }

for jstests_pair1test.z...
m35001| Mon Feb 8 18:39:25 Buildindex jstests_pair1test.z idxNo:0 { name: "id", ns: "jstests_pair1test.z", key:

{ _id: 1 }

}
m35001| Mon Feb 8 18:39:25 done for 0 records 0secs
m35002| Mon Feb 8 18:39:25 repl: sleep 2sec before next pass
m35001| Mon Feb 8 18:39:25 end connection 127.0.0.1:33073
m35001| Mon Feb 8 18:39:25 MessagingPort recv() errno:104 Connection reset by peer 127.0.0.1:35002
m35001| Mon Feb 8 18:39:25 jstests_pair1test. Assertion failure c.get() db/cloner.cpp 100
m35002| Mon Feb 8 18:39:25 got kill or ctrl c signal 15 (Terminated), will terminate after current cmd ends
m35002| Mon Feb 8 18:39:25 now exiting
m35002| Mon Feb 8 18:39:25 dbexit:
m35002| Mon Feb 8 18:39:25 shutdown: going to close listening sockets...
m35002| Mon Feb 8 18:39:25 going to close listening socket: 23
m35002| Mon Feb 8 18:39:25 shutdown: going to flush oplog...
m35002| Mon Feb 8 18:39:25 shutdown: going to close sockets...
m35002| Mon Feb 8 18:39:25 MessagingPort recv() errno:9 Bad file descriptor 127.0.0.1:57856
m35002| Mon Feb 8 18:39:25 end connection 127.0.0.1:57856
m35002| Mon Feb 8 18:39:25 shutdown: waiting for fs preallocator...
m35002| Mon Feb 8 18:39:25 shutdown: closing all files...
m35002| Mon Feb 8 18:39:25 closeAllFiles() finished
m35002| Mon Feb 8 18:39:25 shutdown: removing fs lock...
m35002| Mon Feb 8 18:39:25 dbexit: really exiting now
m35001| 0x65c86b 0x65c0a7 0x662078 0x6ea7d3 0x6ec2f1 0x6ee34f 0x6d51a4 0x6d6775 0x6d969e 0x6daf93 0x6dbaa6 0x6dc02d 0x6dc418 0x658d57 0x7fc1587b5130 0x7fc158ed9a04 0x7fc1574aa7bd
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x65c86b]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo12sayDbContextEPKc+0x5e) [0x65c0a7]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo8assertedEPKcS1_j+0xf5) [0x662078]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo6Cloner4copyEPKcS2_bbbbNS_5QueryE+0x20d) [0x6ea7d3]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo6Cloner2goEPKcRSsRKSsbbbb+0x11a5) [0x6ec2f1]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo9cloneFromEPKcRSsRKSsbbbb+0x6e) [0x6ee34f]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo10ReplSource6resyncESs+0x168) [0x6d51a4]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo10ReplSource29sync_pullOpLog_applyOperationERNS_7BSONObjEPNS_6OpTimeE+0xb15) [0x6d6775]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo10ReplSource14sync_pullOpLogERi+0x1696) [0x6d969e]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo10ReplSource4syncERi+0x55d) [0x6daf93]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo9_replMainERSt6vectorIN5boost10shared_ptrINS_10ReplSourceEEESaIS4_EERi+0xf9) [0x6dbaa6]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo8replMainEv+0x119) [0x6dc02d]
m35001| /home/aaron/src/mongo/mongod(_ZN5mongo15replSlaveThreadEv+0x152) [0x6dc418]
m35001| /home/aaron/src/mongo/mongod(_ZN5boost6detail11thread_dataIPFvvEE3runEv+0x19) [0x658d57]
m35001| /usr/local/lib/libboost_thread.so.1.41.0(thread_proxy+0x60) [0x7fc1587b5130]
m35001| /lib/libpthread.so.0 [0x7fc158ed9a04]
m35001| /lib/libc.so.6(clone+0x6d) [0x7fc1574aa7bd]
m35001| Mon Feb 8 18:39:25 replMain AssertionException assertion db/cloner.cpp:100
m35001| Mon Feb 8 18:39:25 repl: sleep 60sec before next pass
shell: stopped mongo program on port 35002
assert: assert.soon failed: function () {
return rp.checkSteadyState(state, expectedMasterHost, twoMasterOk, leftValues, rightValues, debug);
}, msg:rp (left: 35001,/data/db/jstests_pair1test-left,127.0.0.1:35002,127.0.0.1:35000 (connected)(slave) right: 35002,/data/db/jstests_pair1test-right,127.0.0.1:35001,127.0.0.1:35000 ) failed to reach expected steady state (1,)
Mon Feb 8 18:39:56 JS Error: uncaught exception: assert.soon failed: function () {
return rp.checkSteadyState(state, expectedMasterHost, twoMasterOk, leftValues, rightValues, debug);
}, msg:rp (left: 35001,/data/db/jstests_pair1test-left,127.0.0.1:35002,127.0.0.1:35000 (connected)(slave) right: 35002,/data/db/jstests_pair1test-right,127.0.0.1:35001,127.0.0.1:35000 ) failed to reach expected steady state (1,)
failed to load: /home/aaron/src/mongo/jstests/repl/pair1.js
m35000| Mon Feb 8 18:39:56 got kill or ctrl c signal 15 (Terminated), will terminate after current cmd ends
m35000| Mon Feb 8 18:39:56 now exiting
m35000| Mon Feb 8 18:39:56 dbexit:
m35000| Mon Feb 8 18:39:56 shutdown: going to close listening sockets...
m35000| Mon Feb 8 18:39:56 going to close listening socket: 15
m35000| Mon Feb 8 18:39:56 shutdown: going to flush oplog...
m35000| Mon Feb 8 18:39:56 shutdown: going to close sockets...
m35000| Mon Feb 8 18:39:56 shutdown: waiting for fs preallocator...
m35000| Mon Feb 8 18:39:56 shutdown: closing all files...
m35000| Mon Feb 8 18:39:56 closeAllFiles() finished
m35000| Mon Feb 8 18:39:56 shutdown: removing fs lock...
m35000| Mon Feb 8 18:39:56 dbexit: really exiting now
m35001| Mon Feb 8 18:39:57 got kill or ctrl c signal 15 (Terminated), will terminate after current cmd ends
m35001| Mon Feb 8 18:39:57 now exiting
m35001| Mon Feb 8 18:39:57 dbexit:
m35001| Mon Feb 8 18:39:57 shutdown: going to close listening sockets...
m35001| Mon Feb 8 18:39:57 going to close listening socket: 20
m35001| Mon Feb 8 18:39:57 shutdown: going to flush oplog...
m35001| Mon Feb 8 18:39:57 shutdown: going to close sockets...
m35001| Mon Feb 8 18:39:57 shutdown: waiting for fs preallocator...
m35001| Mon Feb 8 18:39:57 shutdown: closing all files...
m35001| Mon Feb 8 18:39:57 closeAllFiles() finished
m35001| Mon Feb 8 18:39:57 shutdown: removing fs lock...
m35001| Mon Feb 8 18:39:57 dbexit: really exiting now
scons: *** [smokeRepl] Error 253
scons: building terminated because of errors.

Compilation exited abnormally with code 2 at Mon Feb 8 18:39:59

Comment by Aaron Staple [ 08/Feb/10 ]

<http://buildbot.mongodb.org/builders/Linux%2064-bit%20debug/builds/97/steps/test_2/logs/stdio>

Comment by auto [ 03/Feb/10 ]

Author:

{'login': 'astaple', 'name': 'Aaron', 'email': 'aaron@10gen.com'}

Message: SERVER-507 comments
http://github.com/mongodb/mongo/commit/12f5a46e4d5bf4bac42ba9fd42c44f85d9a9c25e

Comment by Aaron Staple [ 11/Jan/10 ]

sounds like we'll be replacing pairing at some point

Comment by Aaron Staple [ 05/Jan/10 ]

<http://buildbot.mongodb.org/builders/OS%20X%2010.5%2032-bit/builds/495/steps/test_2/logs/stdio>

Comment by Aaron Staple [ 28/Dec/09 ]

<http://buildbot.mongodb.org/builders/OS%20X%2010.5%2032-bit/builds/440/steps/test_2/logs/stdio>

Generated at Thu Feb 08 02:54:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.