[SERVER-12140] ReplSetTest upgradeSet() needs to reauth after reconnecting Created: 17/Dec/13  Updated: 11/Jul/16  Resolved: 30/Dec/13

Status: Closed
Project: Core Server
Component/s: Replication, Security, Testing Infrastructure
Affects Version/s: 2.4.0, 2.4.8, 2.5.4
Fix Version/s: 2.5.5

Type: Bug Priority: Major - P3
Reporter: Valeri Karpov Assignee: Valeri Karpov
Resolution: Done Votes: 0
Labels: 26qa, upgrading
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OSX, upgrading from 2.4.0 / 2.4.8 to 2.5.4


Attachments: Text File output.txt     File repl_upgrade_normal.js     File x509_repl_upgrade.js    
Issue Links:
Related
Operating System: ALL
Participants:

 Description   

I'm getting some authorization errors when I try to upgrade a replica set from 2.4.0 to 2.5.4 with a keyfile:

2013-12-17T11:40:00.501-0500 shell: started program /Users/vkarpov/qa/QA/QA-424/mongod-26 --oplogSize 40 --keyFile jstests/libs/key1 --port 31001 --noprealloc --smallfiles --rest --replSet rs1 --dbpath /data/db/rs1-1 --clusterAuthMode keyFile
2013-12-17T11:40:00.502-0500 warning: Failed to connect to 127.0.0.1, reason: errno:61 Connection refused
m31001| 2013-12-17T11:40:00.525-0500 ** WARNING: --rest is specified without --httpinterface,
m31001| 2013-12-17T11:40:00.525-0500 ** enabling http interface
m31001| note: noprealloc may hurt performance in many applications
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] MongoDB starting : pid=2231 port=31001 dbpath=/data/db/rs1-1 64-bit host=specter.local
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten]
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] ** NOTE: This is a development version (2.5.4) of MongoDB.
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] ** Not recommended for production.
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten]
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten]
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] db version v2.5.4
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] git version: ffd52e5f46cf2ba74ba931c78da62d4a7f480d8e
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] build info: Darwin bs-osx-108-x86-64-2.10gen.cc 12.3.0 Darwin Kernel Version 12.3.0: Sun Jan 6 22:37:10 PST 2013; root:xnu-2050.22.13~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] allocator: system
m31001| 2013-12-17T11:40:00.528-0500 [initandlisten] options:

{ clusterAuthMode: "keyFile", dbpath: "/data/db/rs1-1", keyFile: "jstests/libs/key1", noprealloc: true, oplogSize: 40, port: 31001, replSet: "rs1", rest: true, smallfiles: true }

...
ReplSetTest waitForIndicator from node connection to specter.local:31001
m31000| Tue Dec 17 11:40:00.706 [conn1] command denied:

{ replSetGetStatus: 1.0 }

ReplSetTest waitForIndicator Initial status ( timeout : 30000 ) :

{ "ok" : 0, "errmsg" : "unauthorized" }

m31000| Tue Dec 17 11:40:00.909 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:40:01.110 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:40:01.313 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:40:01.515 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:40:01.718 [conn1] command denied:

{ replSetGetStatus: 1.0 }

These errors start when I restart one node with the 2.5.4 binary. The 2.5.4 command line flags look like what I would expect to work, but yet the upgrade can't continue. I've attached the scripts that I'm using for this.



 Comments   
Comment by Githook User [ 20/Dec/13 ]

Author:

{u'username': u'vkarpov15', u'name': u'Valeri Karpov', u'email': u'valeri.karpov@10gen.com'}

Message: SERVER-12140 add some auth-related options to jstests multi_rs library

Signed-off-by: Matt Kangas <matt.kangas@mongodb.com>
Branch: master
https://github.com/mongodb/mongo/commit/3e6d6e56c0a8ff1db6c1518fd404c0b37d9db0dd

Comment by Spencer Brody (Inactive) [ 17/Dec/13 ]

Problem seems to be that during the upgrade of the replica set the node is restarted but then never re-authenticated to. The fix is to make the upgradeSet function take authentication options and re-authenticate any connections it uses after the nodes get restarted.

Comment by Valeri Karpov [ 17/Dec/13 ]

Anything else I can provide y'all to help debug this? This is blocking me from continuing on replica set UDR

Comment by Valeri Karpov [ 17/Dec/13 ]

Here's where things start going wrong:

m31002| Tue Dec 17 11:52:52.947 [conn4] authenticate db: local

{ authenticate: 1, nonce: "907010cb2735a2e0", user: "__system", key: "3e52944ec90b9e2004fbc1132dce32d4" }

m31000| Tue Dec 17 11:52:52.948 [rsHealthPoll] replSet member specter.local:31002 is now in state SECONDARY
2013-12-17T11:52:53.004-0500 shell: stopped mongo program on port 31001
ReplSetTest n is : connection to specter.local:31001
ReplSetTest n: 1 ports: [ 31000, 31001, 31002 ] 31001 number
{
"useHostName" : true,
"oplogSize" : 40,
"keyFile" : "jstests/libs/key1",
"port" : 31001,
"noprealloc" : "",
"smallfiles" : "",
"rest" : "",
"replSet" : "rs1",
"dbpath" : "$set-$node",
"binVersion" : "26",
"clusterAuthMode" : "keyFile",
"restart" : true,
"pathOpts" :

{ "node" : 1, "set" : "rs1" }

}
ReplSetTest (Re)Starting....
2013-12-17T11:52:53.007-0500 shell: started program /Users/vkarpov/qa/QA/QA-424/mongod-26 --oplogSize 40 --keyFile jstests/libs/key1 --port 31001 --noprealloc --smallfiles --rest --replSet rs1 --dbpath /data/db/rs1-1 --clusterAuthMode keyFile
2013-12-17T11:52:53.007-0500 warning: Failed to connect to 127.0.0.1, reason: errno:61 Connection refused
m31001| 2013-12-17T11:52:53.037-0500 ** WARNING: --rest is specified without --httpinterface,
m31001| 2013-12-17T11:52:53.038-0500 ** enabling http interface
m31001| note: noprealloc may hurt performance in many applications
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] MongoDB starting : pid=2243 port=31001 dbpath=/data/db/rs1-1 64-bit host=specter.local
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten]
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] ** NOTE: This is a development version (2.5.4) of MongoDB.
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] ** Not recommended for production.
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten]
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] ** WARNING: soft rlimits too low. Number of files is 256, should be at least 1000
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten]
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] db version v2.5.4
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] git version: ffd52e5f46cf2ba74ba931c78da62d4a7f480d8e
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] build info: Darwin bs-osx-108-x86-64-2.10gen.cc 12.3.0 Darwin Kernel Version 12.3.0: Sun Jan 6 22:37:10 PST 2013; root:xnu-2050.22.13~1/RELEASE_X86_64 x86_64 BOOST_LIB_VERSION=1_49
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] allocator: system
m31001| 2013-12-17T11:52:53.042-0500 [initandlisten] options:

{ clusterAuthMode: "keyFile", dbpath: "/data/db/rs1-1", keyFile: "jstests/libs/key1", noprealloc: true, oplogSize: 40, port: 31001, replSet: "rs1", rest: true, smallfiles: true }

m31001| 2013-12-17T11:52:53.049-0500 [initandlisten] journal dir=/data/db/rs1-1/journal
m31001| 2013-12-17T11:52:53.049-0500 [initandlisten] recover : no journal files present, no recovery needed
m31001| 2013-12-17T11:52:53.108-0500 [initandlisten] waiting for connections on port 31001
m31001| 2013-12-17T11:52:53.109-0500 [websvr] admin web console waiting for connections on port 32001
m31001| 2013-12-17T11:52:53.110-0500 [rsStart] replSet I am specter.local:31001
m31001| 2013-12-17T11:52:53.120-0500 [rsStart] replSet STARTUP2
m31001| 2013-12-17T11:52:53.120-0500 [rsSync] replSet SECONDARY
m31001| 2013-12-17T11:52:53.208-0500 [initandlisten] connection accepted from 127.0.0.1:65287 #1 (1 connection now open)
[
connection to specter.local:31000,
connection to specter.local:31001,
connection to specter.local:31002
]
ReplSetTest waitForIndicator state on connection to specter.local:31001
[ 1, 2, 7 ]
ReplSetTest waitForIndicator from node connection to specter.local:31001
m31000| Tue Dec 17 11:52:53.212 [conn1] command denied:

{ replSetGetStatus: 1.0 }

ReplSetTest waitForIndicator Initial status ( timeout : 30000 ) :

{ "ok" : 0, "errmsg" : "unauthorized" }

m31000| Tue Dec 17 11:52:53.414 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31002| Tue Dec 17 11:52:53.503 [rsHealthPoll] DBClientCursor::init call() failed
m31002| Tue Dec 17 11:52:53.503 [rsHealthPoll] replset info specter.local:31001 heartbeat failed, retrying
m31001| 2013-12-17T11:52:53.503-0500 [initandlisten] connection accepted from 10.4.101.171:65288 #2 (2 connections now open)
m31001| 2013-12-17T11:52:53.504-0500 [conn2] authenticate db: local

{ authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }

m31002| Tue Dec 17 11:52:53.504 [rsHealthPoll] replset info specter.local:31001 thinks that we are down
m31000| Tue Dec 17 11:52:53.617 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:53.820 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:54.023 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:54.226 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:54.429 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:54.632 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:54.835 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31001| 2013-12-17T11:52:54.949-0500 [initandlisten] connection accepted from 10.4.101.171:65289 #3 (3 connections now open)
m31001| 2013-12-17T11:52:54.949-0500 [conn3] authenticate db: local

{ authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }

m31000| Tue Dec 17 11:52:54.950 [rsHealthPoll] replset info specter.local:31001 thinks that we are down
m31000| Tue Dec 17 11:52:54.950 [rsHealthPoll] replSet member specter.local:31001 is up
m31000| Tue Dec 17 11:52:54.950 [rsHealthPoll] replSet member specter.local:31001 is now in state SECONDARY
m31000| Tue Dec 17 11:52:55.038 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:55.112 [initandlisten] connection accepted from 10.4.101.171:65291 #14 (7 connections now open)
m31002| Tue Dec 17 11:52:55.112 [initandlisten] connection accepted from 10.4.101.171:65292 #5 (3 connections now open)
m31000| Tue Dec 17 11:52:55.113 [conn14] authenticate db: local

{ authenticate: 1, nonce: "89cb7cf0e00b349b", user: "__system", key: "e86baab1db55669a3c07436c14570e66" }

m31002| Tue Dec 17 11:52:55.113 [conn5] authenticate db: local

{ authenticate: 1, nonce: "8b17459d4e6d6f0d", user: "__system", key: "c4bb188860a17ede1223f46a58951eda" }

m31001| 2013-12-17T11:52:55.113-0500 [rsHealthPoll] replSet member specter.local:31000 is up
m31001| 2013-12-17T11:52:55.113-0500 [rsHealthPoll] replSet member specter.local:31000 is now in state PRIMARY
m31001| 2013-12-17T11:52:55.113-0500 [rsHealthPoll] replSet member specter.local:31002 is up
m31001| 2013-12-17T11:52:55.113-0500 [rsHealthPoll] replSet member specter.local:31002 is now in state SECONDARY
m31000| Tue Dec 17 11:52:55.241 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:55.443 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:55.646 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:55.848 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:56.051 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:56.253 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:56.456 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:56.660 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:56.853 [conn10] end connection 10.4.101.171:65280 (6 connections now open)
m31000| Tue Dec 17 11:52:56.862 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:57.020 [conn11] end connection 10.4.101.171:65281 (5 connections now open)
m31000| Tue Dec 17 11:52:57.064 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:57.267 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:57.469 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:57.672 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:57.875 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:58.078 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:58.281 [conn1] command denied:

{ replSetGetStatus: 1.0 } { "ok" : 0, "errmsg" : "unauthorized" }

m31000| Tue Dec 17 11:52:58.485 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:58.687 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:58.890 [conn1] command denied:

{ replSetGetStatus: 1.0 }

m31000| Tue Dec 17 11:52:59.093 [conn1] command denied:

{ replSetGetStatus: 1.0 }

I've attached the full output too

Comment by Spencer Brody (Inactive) [ 17/Dec/13 ]

You cut off the server log in the ticket description - can you post the full log?

Comment by Valeri Karpov [ 17/Dec/13 ]

I'm using the 2.5.4 shell to run this test, so I shouldn't be using the 2.4 ReplSetTest

Comment by Eric Milkie [ 17/Dec/13 ]

I'm not certain, but is the problem that 2.4 ReplSetTest does not support nodes running 2.5+? You may need to modify ReplSetTest to get it to work for this type of testing.

Generated at Thu Feb 08 03:27:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.