Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.2.17
Component/s: None
Labels:
None

Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We have a replica set of 5 nodes.

From the log, the primary node was handling some aggregation which caused the CPU utilization to 100% in a short time. And after 10 seconds of no-respond from primary, an election started and a new primary was elected. The result sent to the previous primary node, and the log showed the previous primary stepped down and changed the state to secondary. However, the state did not change due to an unknown reason, and when we use `rs.status()` command on any node in the cluster, we can find two primary nodes at the same time (although other 3 secondary nodes sync to the new primary)

As a result, some users using PyMongo to connect to the cluster met with connection issues while some users did not. I guess it's because some users connected to the wrong primary node (the previous one)

We tried to remove the previous primary and added it back, there would still be two primary nodes. We had to reboot the previous primary and added it back to the cluster, this time it turned to be rollback state, after several minutes, it became a secondary.

Assignee:: Edwin Zhou
Reporter:: Zijun Tian
Participants:: Edwin Zhou, Zijun Tian
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Nov 24 2021 02:31:31 AM UTC
Updated:: Jun 10 2022 01:07:24 PM UTC
Resolved:: Dec 22 2021 04:58:33 PM UTC

Details

Description

Attachments

Activity

People

Dates