[SERVER-36045] unexpected "re-evaluating sync source" Created: 10/Jul/18 Updated: 06/Dec/22 Resolved: 16/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | zhenyipeng | Assignee: | Backlog - Triage Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Server Triage
|
| Operating System: | ALL |
| Steps To Reproduce: |
|
| Participants: |
| Description |
|
I got this warning in the log file: "Fri Jun 29 22:00:02.585 I REPL [ReplicationExecutor] re-evaluating sync source because our current sync source's most recent OpTime is (term: 2, timestamp: Jun 29 21:50:07:1) which is more than 30s behind member xxx.xxx.xxx.xxx:7005 whose most recent OpTime is (term: 2, timestamp: Jun 29 22:00:01:1)" The write majority request is hung up and timeout when the two secondaries are both doing "re-evaluating sync source" The current sync source is primary node, so I think its "OpTime" should not behind the secondary nodes.
|
| Comments |
| Comment by Ramon Fernandez Marina [ 16/Jul/18 ] |
|
Thanks for your report zhenyipeng, sync source reevaluation is expected behavior and not indicative of a bug in and on itself. Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience.]] For questions about developing MongoDB please post on the mongodb-dev group. See also our Technical Support page for additional support resources. Regards, |
| Comment by zhenyipeng [ 10/Jul/18 ] |
|
I read the code of TopologyCoordinatorImpl::shouldChangeSyncSource function in src/mongo/db/repl/topology_coordinator_impl.cpp The currentSourceOpTime is the max value between "syncSourceLastOpTime" and "_hbdata[currentSourceIndex].getAppliedOpTime()" But the "syncSourceLastOpTime" is always (term: -1, timestamp: Jan 1 08:00:00:0), so it is not effective. So the the OpTime comparison between current sync source node and the other node depends on OpTime in heartbeat time. But the heartbeat time is not a real-time value as it updates every 2 seconds, so I think here is the problem. I replace the "syncSourceLastOpTime" by "_lastOpTimeFetched" to solve this problem temporarily. Can I do something to make "syncSourceLastOpTime" effective? |