[SERVER-27482] Primary Locked, serverStatus Fails to Return Created: 20/Dec/16  Updated: 29/Dec/16  Resolved: 29/Dec/16

Status: Closed
Project: Core Server
Component/s: MMAPv1, Stability, Write Ops
Affects Version/s: 2.6.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jason Terpko Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Production - Linux


Operating System: ALL
Participants:

 Description   

Support,

We have encountered an issue where a primary appears to be locked in a state that blocks operations on all databases including local and prevents serverStatus from returning. When currentOp() was retrieved there were no active operations holding a global write lock only a database read operation targeting a global read lock and database read lock.

"locks" : {
	"^" : "r",
	"^mydb" : "R"
}

To resolve the issue we must kill the primary and force an election. The secondaries report the following errors shortly after the primary enters this state but do not trigger an election.

[rsBackgroundSync] Socket recv() timeout  192.168.1.81:27017
[rsBackgroundSync] SocketException: remote: 192.168.1.81:27017 error: 9001 socket exception [RECV_TIMEOUT] server [192.168.1.81:27017]
[rsBackgroundSync] DBClientCursor::init call() failed
[rsBackgroundSync] replSet sync source problem: 10276 DBClientBase::findN: transport error: server1.example.com:27017 ns: local.oplog.rs query: {}
[rsBackgroundSync] replSet syncing to: server1.example.com:27017

Prior to the issue the mongod is processing deletes using the Bulk() operations builder. The mongod is part of a sharded cluster with 20+ shards running version 2.6.10 on Linux. The shard key is _id: hashed and these bulk deletes are not on the shard key.

Can you please provide any known bugs that have the same symptoms as above or advise on finding the root cause?

Thank you,

Jason



 Comments   
Comment by Kelsey Schubert [ 29/Dec/16 ]

Hi jason_or

Thanks for your report. Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Please note that MongoDB 2.6 is no longer supported. Therefore, I would recommend upgrading to a newer version.

Kind regards,
Thomas

Comment by Jason Terpko [ 21/Dec/16 ]

Note, I see true wasn't passed to currentOp at the time of the issue. If this issue reoccurs true will be passed to currentOp().

Generated at Thu Feb 08 04:15:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.