[SERVER-27274] db.adminCommand( { setFeatureCompatibilityVersion: "3.4" } ) on replicaset primary gives no feedback, appears to hang Created: 05/Dec/16  Updated: 07/Dec/16  Resolved: 07/Dec/16

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Networking, Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Brenden Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux Debian 7, 8


Issue Links:
Related
is related to DOCS-9595 setFeatureCompatibilityVersion uses w... Closed
Operating System: ALL
Participants:

 Description   

Running db.adminCommand({ setFeatureCompatibilityVersion: "3.4" }) on replicaset primary gives no feedback, appears to hang.

Possibly related to causing replication failure?

Entry from primary server log after replication cessation:

2016-12-04T23:28:22.399-0600 I NETWORK  [conn2985] received client metadata from 10.0.0.10:56185 conn2985: { driver: { name: "NetworkInterfaceASIO-RS", version: "3.4.0" }, os: { type: "Linux", name: "PRETTY_NAME="Debian GNU/Linux 7 (wheezy)"", architecture: "x86_64", version: "Kernel 3.2.0-4-amd64" } }

Corresponding entries from one secondary log after replication cessation:

2016-12-05T00:35:40.538-0500 I REPL     [replication-24] Restarting oplog query due to error: ExceededTimeLimit: Operation timed out, request was RemoteCommand 380827 -- target:m3.internal.net:27017 db:local expDate:2016-12-05T00:35:40.538-0500 cmd:{ getMore: 35490922038, collection: "oplog.rs", maxTimeMS: 2000 }. Last fetched optime (with hash): { ts: Timestamp 1480915633000|442, t: -1 }[-113046468778785762]. Restarts remaining: 3
 
2016-12-05T00:35:40.538-0500 I REPL     [replication-24] Scheduled new oplog query Fetcher source: m3.internal.net:27017 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp 1480915633000|442 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000 } query metadata: { $ssm: { $secondaryOk: 1 } } active: 1 timeout: 10000ms inShutdown: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 380914 -- target:m3.internal.net:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp 1480915633000|442 } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 60000 } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms



 Comments   
Comment by Kelsey Schubert [ 07/Dec/16 ]

Hi michaelbrenden,

Thanks for the update, I'm glad your team was able to identify the root cause of these issues.

Kind regards,
Thomas

Comment by Michael Brenden [ 07/Dec/16 ]

Please scratch this entire thread / bug report — our network team uncovered a serious misconfiguration in the network.

Comment by Kelsey Schubert [ 06/Dec/16 ]

Hi michaelbrenden,

I suspect that this behavior is occcuring because

db.adminCommand( { setFeatureCompatibilityVersion: "3.4" } )

requires a majority of data bearing nodes to accept the write.

I've opened DOCS-9595 to clarify this behavior. When you are observing this issue, would you please connect to the mongod and execute rs.status()?

Thank you,
Thomas

Comment by Kelsey Schubert [ 05/Dec/16 ]

I've created a secure portal for you to use. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after sometime.

Comment by Kelsey Schubert [ 05/Dec/16 ]

Hi michaelbrenden,

Would you please upload the complete logs from the nodes in the replica set so we can continue to investigate this issue?

Thank you,
Thomas

Generated at Thu Feb 08 04:14:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.