[SERVER-10420] replmonitor_bad_seed.js fails with auth because it tries to read all user data while a shard is down Created: 03/Aug/13  Updated: 11/Jul/16  Resolved: 18/Nov/13

Status: Closed
Project: Core Server
Component/s: Security, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: Matt Kangas Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS X 10.5 64-bit DUR OFF
Solaris-SmartOS 64-bit
Linux 64-bit
(probably others)


Operating System: ALL
Participants:

 Description   

This issue just appeared on master for three separate builders:

Linux 64-bit Build #5684 Aug 2 rev f736febe5a0
http://buildlogs.mongodb.org/Linux%2064-bit/builds/5684/test/sharding/replmonitor_bad_seed.js

Solaris-SmartOS 64-bit Build #1192 Aug 2 rev 07faf6eef1
http://buildlogs.mongodb.org/Solaris-SmartOS%2064-bit/builds/1192/test/sharding/replmonitor_bad_seed.js

OS X 10.5 64-bit DUR OFF Build #2527 Aug 2 rev f736febe5a0
http://buildlogs.mongodb.org/OS%20X%2010.5%2064-bit%20DUR%20OFF/builds/2527/test/sharding/replmonitor_bad_seed.js

ReplSetTest stopSet *** Shut down repl set - test worked ****
2013-08-02 20:45:24 EDT	
 m30999| Sat Aug  3 00:45:23.360 [mongosMain] dbexit: received signal 15 rc:0 received signal 15
 m29000| Sat Aug  3 00:45:23.362 [conn3] end connection 10.29.160.141:39516 (4 connections now open)
 m29000| Sat Aug  3 00:45:23.362 [conn4] end connection 10.29.160.141:39517 (3 connections now open)
 m29000| Sat Aug  3 00:45:23.362 [conn5] end connection 10.29.160.141:39518 (2 connections now open)
Sat Aug  3 00:45:24.361 shell: stopped mongo program on port 30999
Sat Aug  3 00:45:24.367 shell: started program /data/buildslaves/Linux_64bit/mongo/mongos --port 30999 --configdb gcov1:29000 --chunkSize 50 --setParameter enableTestCommands=1 --setParameter enableTestCommands=1
2013-08-02 20:45:26 EDT	
 m30999| Sat Aug  3 00:45:24.385 warning: running with 1 config server should be done only for testing purposes and is not recommended for production
 m30999| Sat Aug  3 00:45:24.387 [mongosMain] MongoS version 2.5.2-pre- starting: pid=4588 port=30999 64-bit host=gcov1 (--help for usage)
 m30999| Sat Aug  3 00:45:24.387 [mongosMain] git version: f736febe5a0d6d5a197b012eebad0243161830b6
 m30999| Sat Aug  3 00:45:24.387 [mongosMain] build info: Linux gcov1 3.2.20-1.29.6.amzn1.x86_64 #1 SMP Tue Jun 12 01:19:28 UTC 2012 x86_64 BOOST_LIB_VERSION=1_49
 m30999| Sat Aug  3 00:45:24.387 [mongosMain] options: { chunkSize: 50, configdb: "gcov1:29000", port: 30999, setParameter: [ "enableTestCommands=1", "enableTestCommands=1" ] }
 m29000| Sat Aug  3 00:45:24.388 [initandlisten] connection accepted from 10.29.160.141:39545 #7 (3 connections now open)
 m29000| Sat Aug  3 00:45:24.404 [initandlisten] connection accepted from 10.29.160.141:39546 #8 (4 connections now open)
 m29000| Sat Aug  3 00:45:24.409 [initandlisten] connection accepted from 10.29.160.141:39548 #9 (5 connections now open)
 m30999| Sat Aug  3 00:45:24.410 [mongosMain] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 2 version: 1|0||51fc52a0c9a1d525c77b20bb based on: (empty)
 m30999| Sat Aug  3 00:45:24.410 [mongosMain] starting new replica set monitor for replica set test-rs0 with seed of gcov1:31100,gcov1:31101,gcov1:31102
 m30999| Sat Aug  3 00:45:24.410 [mongosMain] error connecting to seed gcov1:31100, err: couldn't connect to server gcov1:31100
 m30999| Sat Aug  3 00:45:24.410 [mongosMain] error connecting to seed gcov1:31101, err: couldn't connect to server gcov1:31101
 m30999| Sat Aug  3 00:45:24.411 [mongosMain] error connecting to seed gcov1:31102, err: couldn't connect to server gcov1:31102
 m30999| Sat Aug  3 00:45:26.411 [mongosMain] warning: No primary detected for set test-rs0
 m30999| Sat Aug  3 00:45:26.411 [mongosMain] All nodes for set test-rs0 are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks
 m30999| Sat Aug  3 00:45:26.411 [mongosMain] replica set monitor for replica set test-rs0 started, address is test-rs0/
 m30999| Sat Aug  3 00:45:26.411 [ReplicaSetMonitorWatcher] starting
 m30999| Sat Aug  3 00:45:26.411 [mongosMain] Initializing user data failed: Unknown error code 11002 socket exception [CONNECT_ERROR] server [test-rs0/gcov1:31100,gcov1:31101,gcov1:31102] mongos connectionpool error: connect failed to replica set test-rs0/gcov1:31100,gcov1:31101,gcov1:31102
 m29000| Sat Aug  3 00:45:26.412 [conn8] end connection 10.29.160.141:39546 (4 connections now open)
 m29000| Sat Aug  3 00:45:26.412 [conn7] end connection 10.29.160.141:39545 (4 connections now open)
 m29000| Sat Aug  3 00:45:26.413 [conn9] end connection 10.29.160.141:39548 (2 connections now open)
Could not start mongo program at 30999, process ended
Sat Aug  3 00:45:26.575 TypeError: Cannot call method 'getDB' of null at /data/buildslaves/Linux_64bit/mongo/jstests/sharding/replmonitor_bad_seed.js:35
failed to load: /data/buildslaves/Linux_64bit/mongo/jstests/sharding/replmonitor_bad_seed.js

I can easily reproduce this failure on Linux 64-bit debug. Kicking off a git-bisect run now.



 Comments   
Comment by Githook User [ 18/Nov/13 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-10420 Re-enable sharding/replmonitor_bad_seed.js test in auth passthrough mode
Branch: master
https://github.com/mongodb/mongo/commit/cc9756ce74fb334413ca27bd1815313144fd4651

Comment by auto [ 07/Aug/13 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-10420 Skip jstests/sharding/replmonitor_bad_seed.js until we no longer initilize all user data at process startup
Branch: master
https://github.com/mongodb/mongo/commit/3e50406c655679a5a3ed52cb64b5750c7518fe6f

Comment by auto [ 07/Aug/13 ]

Author:

{u'username': u'stbrody', u'name': u'Spencer T Brody', u'email': u'spencer@10gen.com'}

Message: SERVER-9518 SERVER-10420 Only initialize all user data at process startup if we're running in auth mode
Branch: master
https://github.com/mongodb/mongo/commit/b304acfde047511aa5ab148f0dc3706651afbe93

Comment by Spencer Brody (Inactive) [ 07/Aug/13 ]

The problem is that right now we are trying to build a complete representation of all the users in the system at process startup. This test, however, brings up a mongos while a shard is completely down. I think the right way to handle this is to just skip this test when run in auth passthrough for now. Once we've fully moved to the V2 user data format, we won't be reading all the user data at process startup anymore, and this test can then be re-enabled in auth mode.

Comment by Tad Marshall [ 04/Aug/13 ]

I reverted 5e9f82f54988c464e6925e48182b909b1b3fe115 in https://github.com/mongodb/mongo/commit/7f239865b136a6aef682cb114fc871b0c81a70b8 to get replmonitor_bad_seed.js to pass.

Comment by Matt Kangas [ 03/Aug/13 ]

git-bisect says:

5e9f82f54988c464e6925e48182b909b1b3fe115 is the first bad commit
commit 5e9f82f54988c464e6925e48182b909b1b3fe115
Author: Spencer T Brody <spencer@10gen.com>
Date:   Wed Jul 31 14:55:21 2013 -0400
 
    SERVER-9518 Initialize user cache on process startup

Generated at Thu Feb 08 03:23:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.