[SERVER-50019] Mongo fail-over primary take some time. Created: 30/Jul/20  Updated: 17/Aug/20  Resolved: 17/Aug/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.17
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vasanth M.Vasanth Assignee: Dmitry Agranat
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi Team,

When I reboot mongo01 node its going to take primary as mongo02 node. After rebooted the mongo01 node get back to primary with STARTUP state. After sometime(20 ms ) moved to PRIMARY state. Please suggest me, why this was happened on my machine. 

Previously it didn't happened like that. 

 

2020-07-29T14:51:50.696+0530 I REPL [replexec-244] Member mongo01:27717 is now in state STARTUP

2020-07-29T14:52:10.703+0530 I REPL [replexec-249] Member mongo01:27717 is now in state PRIMARY

 

2020-07-29T14:51:52.803+0530 I REPL [replication-1] ******
2020-07-29T14:51:52.803+0530 I STORAGE [replication-1] dropAllDatabasesExceptLocal 1
2020-07-29T14:51:52.803+0530 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to mongo02:27717
2020-07-29T14:51:52.804+0530 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to mongo02:27717, took 1ms (1 connections now open to mongo02:27717)
2020-07-29T14:51:52.805+0530 I ASIO [NetworkInterfaceASIO-RS-0] Connecting to mongo02:27717
2020-07-29T14:51:52.821+0530 I ASIO [NetworkInterfaceASIO-RS-0] Successfully connected to mongo02:27717, took 16ms (2 connections now open to mongo02:27717)
2020-07-29T14:51:52.827+0530 I REPL [replication-1] CollectionCloner::start called, on ns:admin.system.users

 

Thanks,

Vasanth



 Comments   
Comment by Dmitry Agranat [ 17/Aug/20 ]

Hi vasanth3g@gmail.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Dima

Comment by Dmitry Agranat [ 09/Aug/20 ]

Hi vasanth3g@gmail.com,

We still need additional information as requested in the last comment to diagnose the problem.

Thanks,
Dima

Comment by Jonathan Streets (Inactive) [ 31/Jul/20 ]

hi vasanth3g@gmail.com,
please could you clarify the issue you are seeing?

  • what used to happen
  • what changed (e.g. software upgrade, configuration change ?)
  • what happens now (e.g. rebooting primary causes it to go through STARTUP state, then becomes PRIMARY ?)

this will help us understand more about the issue you are seeing.
Regards,
Jon

 

Comment by Vasanth M.Vasanth [ 30/Jul/20 ]

While doing sk_cache , its executed replication step down for 20s. I don't know why.

2020-07-29T14:52:05.903+0530 I COMMAND  [conn750502] command admin.$cmd command: replSetStepDown { replSetStepDown: 20, secondaryCatchUpPeriodSecs: 1, $clusterTime: { clusterTime: Timestamp(1596014523, 1), signature:

{ hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 }

}, $db: "admin" } numYields:0 reslen:369 locks:{ Global: { acquireCount:

{ r: 2, W: 2 }

, acquireWaitCount: { W: 1 }, timeAcquiringMicros: { W: 1785 } } } protocol:op_msg 1165ms

Generated at Thu Feb 08 05:21:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.