[SERVER-61803] Multiple Primary node in Mongodb replicaset Created: 27/Nov/21 Updated: 05/Mar/22 Resolved: 03/Mar/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.4.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | sadegh raz | Assignee: | Edwin Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | mongodb, replica-set | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 18.04.4 LTS |
||
| Participants: |
| Description |
| Comments |
| Comment by sadegh raz [ 05/Mar/22 ] | |
|
Hi Edwin. 2 weeks ago we updated the cluster to latest version as you suggested before. Now I'm waiting to see the issue persist or not. Thank you
| |
| Comment by Edwin Zhou [ 03/Mar/22 ] | |
|
We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. Best, | |
| Comment by Edwin Zhou [ 22/Feb/22 ] | |
|
We still need additional information to diagnose the problem. If this incident occurs again, can you supply the following data from both the former/stalled and current primary, ideally at the time when a new primary is elected?
Best, | |
| Comment by Edwin Zhou [ 10/Feb/22 ] | |
|
Thank you for following up. We look forward to analyzing the incident when you attach the diagnostics. Here is a new link to the support uploader as the previous one has expired. Best, | |
| Comment by sadegh raz [ 02/Feb/22 ] | |
|
Hi Guys. it happend again even after update to 4.4.10 . profiling is enabled on all nodes. i will send new statics next time iot happend. | |
| Comment by Eric Sedor [ 31/Jan/22 ] | |
|
Hi razzaghisaa@gmail.com, are you able to provide the information my colleague Edwin requested? | |
| Comment by Edwin Zhou [ 07/Jan/22 ] | |
|
Thank you for uploading gdb stacks and diagnostic data. Unfortunately the diagnostic data, logs, and gdb do not coincide within the same timeline, so it's unclear when the failover happened that led to a node incorrectly stuck in a primary state. Additionally, we also want all of the information you provided for the current primary from the former/stalled primary. Here's the data you have provided so far:
If this incident occurs again, can you supply the following data from both the former/stalled and current primary, ideally at the time when a new primary is elected?
I've also noticed that this repeated occurrence happened on MongoDB v4.4.5, which is exposed to Can you confirm whether you were profiling the database at the time of the stalled former primary? Best, | |
| Comment by sadegh raz [ 28/Dec/21 ] | |
|
Hi @edwin.zhou The problem recurred, so I uploaded Diag logs as well. Regards. | |
| Comment by sadegh raz [ 26/Dec/21 ] | |
|
Hi Edwin
Sorry for late answer. we were need some time to prepare the logs. i uploaded logs and gdb files. unfortunately diagnostic.data was deleted because retention policy but i can send it right after next time that the issue will happend again.
thanks. | |
| Comment by Edwin Zhou [ 22/Dec/21 ] | |
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please let us know if you've come across this issue after upgrading to the latest version of 4.4, and if you've come across a repeat occurrence, can you please provide for us stack traces on the stalled primary? Best, | |
| Comment by Edwin Zhou [ 06/Dec/21 ] | |
|
Thank you for your report. I'd first like to note that MongoDB version 4.4.5 is not recommended for production use due to critical issues and highly advise that you upgrade to the latest version of MongoDB v4.4.x and perform validate() on all collections. Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location? Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. I suspect that this issue may be an occurrence of If this behavior happens again, can you collect stack traces on the stalled nodes? The particular node is the primary that is stalled and unable to step down.
Best, |