[SERVER-35738] Mongodb bad performance causes heartbeat lose Created: 22/Jun/18 Updated: 04/Sep/18 Resolved: 01/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Zihang Cui | Assignee: | Nick Brewer |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: | we don't konw |
| Participants: |
| Description |
|
Secondary became primary through election when secondary can not find primary. Before this time, many time-consuming operations happened at the primary node, which can be seen in the attach file. primary 10.0.208.149,important log: 2018-06-17T04:06:58.242 connection accepted from 10.0.189.100:39064 #206971 2018-06-17T04:06:59.888 [ReplicationExecutor] Member 10.0.189.100:27017 is now in state SECONDARY secondry 10.0.189.100,important log 2018-06-17T04:06:41.460 [ReplicationExecutor] could not find member to sync from 2018-06-17T04:06:58.239+0000 [ReplicationExecutor] Starting an election, since we've seen no PRIMARY in the past 10000ms 2018-06-17T04:06:58.243+0000 I REPL [ReplicationExecutor] VoteRequester: Got no vote from 10.0.208.149:27017 because: candidate's data is staler than mine, resp: { term: 95, voteGranted: false, reason: "candidate's data is staler than mine", ok: 1.0 }
|
| Comments |
| Comment by Zihang Cui [ 02/Aug/18 ] | |
|
I check current and available connections in the mongo shell via: db.serverStatus().connections /* 1 */ { "current" : 3664, "available" : 47536, "totalCreated" : NumberLong(2378823) }
| |
| Comment by Nick Brewer [ 01/Aug/18 ] | |
|
cuizihang It's difficult to compare the diagnostic data against the logs you've provided, as the dates are much older (from when this ticket was first opened). That said, I am seeing multiple entries similar to this in the primary logs:
There are a few potential causes for this message, but I would start by looking at your operating system limits to see if they are configured according to our production notes. It is possible that your ulimit settings for files are too low. From your responses, I do not see anything to indicate a bug in the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group. -Nick | |
| Comment by Zihang Cui [ 26/Jul/18 ] | |
|
Would you please archive (tar or zip) the $dbpath/diagnostic.data directory from each node in the replica set, and attach them to this ticket? -------------------------------------------------------------------------------------------- I hope some new discoveries. Thank you. | |
| Comment by Zihang Cui [ 26/Jul/18 ] | |
|
Secondary became primary again. Secondary log: | |
| Comment by Nick Brewer [ 25/Jun/18 ] | |
|
cuizihang Sure - we'll await your findings. Regards, | |
| Comment by Zihang Cui [ 25/Jun/18 ] | |
|
$> ls /data/db_dir/mongodb/diagnostic.data/ -lrth Sorry,file had been delete by mongo daemon at Jun 17. We will attach them to this ticket when accident happen again. Would you please keep this jira “WAITING FOR USER”? Thank you. | |
| Comment by Nick Brewer [ 22/Jun/18 ] | |
|
Would you please archive (tar or zip) the $dbpath/diagnostic.data directory from each node in the replica set, and attach them to this ticket? Thank you, | |
| Comment by Zihang Cui [ 22/Jun/18 ] | |
|
"Mongodb bad performance causes heartbeat lose" in the title of this issue is our guess to cause election, but we can't be sure. |