[SERVER-23522] Replica set recovery doesn't happen immediately at boot with 3.2 as it did with <3.2 Created: 04/Apr/16  Updated: 07/Jun/16  Resolved: 07/Jun/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nathan Neulinger Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File log-c-mosaic-arb.txt.gz     File log-c-mosaic-db1.txt.gz     File log-c-mosaic-db2.txt.gz    
Issue Links:
Duplicate
duplicates SERVER-24058 Connection pool asio doesn't honor se... Closed
Related
Operating System: ALL
Steps To Reproduce:

Set up replica set with 2 nodes and an arbiter. Kill the primary. Start it back up with a empty data directory.

Participants:

 Description   

Had to recover an instance today that was previously the primary in a replica set (2+arb). Followed the same procedure that I've used in past with lightly loaded instances - started with an empty data directory.

In the past, this would immediately start recovering from the standby instance. With this current 3.2.4 deployment, it sat there for a bit over 3 minutes before it started recovery/rebuild process.

Is that expected with 3.2 or some new tuning parameter?



 Comments   
Comment by Kelsey Schubert [ 07/Jun/16 ]

Hi nneul@sevogle.com,

Thank you for providing the additional information. The network issue observed in the logs indicates that this behavior is a result of SERVER-24058, which has been fixed in MongoDB 3.2.7 (released today). Please upgrade to MongoDB 3.2.7 and report back if you encounter this issue again.

Thanks again,
Thomas

Comment by Nathan Neulinger [ 06/Apr/16 ]

logs. prior to 22:45 I tried several times to bring it back online including trying to re-apply (incorrectly as you'll see) the replica set config to 'kick it', 22:4x was the last attempt and I let it sit and it eventually came online.

I'll see if I can reproduce this symptom.

Comment by Nathan Neulinger [ 06/Apr/16 ]

In the process of gathering logs from the other nodes - I found some log entries from prior to the event that make it look like their may have been some issue with cluster state prior.

I'll attach the logs, but this may be a red herring.

Comment by Daniel Pasette (Inactive) [ 06/Apr/16 ]

That is not expected behavior. If you have the full log files from the other members of the replica set would you be able to compress and attach them to this ticket?
Thanks

Comment by Nathan Neulinger [ 04/Apr/16 ]

Apr  4 22:43:42 c-mosaic-db1 run-mongod: 2016-04-04T22:43:42.845+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/local/mongo/data/diagnostic.data'
Apr  4 22:43:42 c-mosaic-db1 run-mongod: 2016-04-04T22:43:42.845+0000 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
Apr  4 22:43:45 c-mosaic-db1 run-mongod: 2016-04-04T22:43:45.015+0000 I NETWORK  [initandlisten] waiting for connections on port 27017
Apr  4 22:43:51 c-mosaic-db1 run-mongod: 2016-04-04T22:43:51.123+0000 I QUERY    [conn1] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}
Apr  4 22:44:01 c-mosaic-db1 rsyslogd: -- MARK --
Apr  4 22:44:46 c-mosaic-db1 run-mongod: 2016-04-04T22:44:46.171+0000 I QUERY    [conn4] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}
Apr  4 22:45:01 c-mosaic-db1 rsyslogd: -- MARK --
Apr  4 22:45:41 c-mosaic-db1 run-mongod: 2016-04-04T22:45:41.121+0000 I QUERY    [conn4] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}
Apr  4 22:46:01 c-mosaic-db1 rsyslogd: -- MARK --
Apr  4 22:46:36 c-mosaic-db1 run-mongod: 2016-04-04T22:46:36.115+0000 I QUERY    [conn4] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}
Apr  4 22:46:44 c-mosaic-db1 run-mongod: 2016-04-04T22:46:44.453+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to c-mosaic-db2-int.<DOMAIN>:27017
Apr  4 22:46:44 c-mosaic-db1 run-mongod: 2016-04-04T22:46:44.471+0000 I REPL     [replExecDBWorker-2] Starting replication applier threads
Apr  4 22:46:44 c-mosaic-db1 run-mongod: 2016-04-04T22:46:44.471+0000 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "repl-c-mosaic", version: 5, protocolVersion: 1, members: [ { _id: 0, ho
st: "c-mosaic-db1-int.<DOMAIN>:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 100.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "c-mosaic-arb-int.<DOMAIN>:27017", arbiterO
nly: true, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "c-mosaic-db2-int.<DOMAIN>:27017", arbiterOnly: false, buildIndexes: true, hidden: false, priori
ty: 50.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefault
s: { w: 1, wtimeout: 0 } } }
Apr  4 22:46:44 c-mosaic-db1 run-mongod: 2016-04-04T22:46:44.471+0000 I REPL     [ReplicationExecutor] This node is c-mosaic-db1-int.<DOMAIN>:27017 in the config
Apr  4 22:46:44 c-mosaic-db1 run-mongod: 2016-04-04T22:46:44.471+0000 I REPL     [ReplicationExecutor] transition to STARTUP2

Generated at Thu Feb 08 04:03:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.