[SERVER-17297] Sync target occasionally changes incorrectly (or not?) Created: 16/Feb/15 Updated: 12/Jun/15 Resolved: 12/Jun/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Aristarkh Zagorodnikov | Assignee: | Sam Kleinman (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
While examining our logs I've found some messages like these:
At first I suspected time sync issue, rechecked twice, found no problems (ntpd is up and running, deviation is less than one second, it wasn't applying time changes at the moment this problem manifested). Since this only occurs at night (local time is GMT+3), and this replica set has very light write load, I suspect that the problem might be in no writes being done for extended periods of time (>30 seconds) and background sync complaining about this erroneously. |
| Comments |
| Comment by Sam Kleinman (Inactive) [ 12/Jun/15 ] | |||||||||||||||||||||||||||||||
|
I've looked over this case again, and it seems that the sync target selection contains has the following rule: .bq When chainingAllowed is false, a member will refuse to sync from a member that isn't primary, from it's perspective, even if the other member has more recent oplog entries than itself. This could explain the message you're seeing. See the following document on chained replication. I'm going to go ahead and close this ticket for now, and sorry again about the confusion. Regards, | |||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 29/May/15 ] | |||||||||||||||||||||||||||||||
|
There are backups, but they are run about 2 hours later than the problem usually manifests itself. Also, the backups are done on different disks. Besides, the network and CPU are never saturated (weekly idle CPU minimum is ~63%, max 1-min load is 0.63, network sometimes peaks at 800Mbps, but when these events happen the utilization is about 150Mbps max). | |||||||||||||||||||||||||||||||
| Comment by Sam Kleinman (Inactive) [ 27/May/15 ] | |||||||||||||||||||||||||||||||
|
Thanks for this information. Wanted to ask a couple follow up questions:
Regards, | |||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 18/May/15 ] | |||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 18/May/15 ] | |||||||||||||||||||||||||||||||
|
Logs from all 3 members of replicaset for a couple of days, look for "changing sync target". | |||||||||||||||||||||||||||||||
| Comment by Aristarkh Zagorodnikov [ 18/May/15 ] | |||||||||||||||||||||||||||||||
|
No problem, Ramón, I understand that the team has to prioritize problems to make real progress. Unfortunately, the logs still contain these messages (in a new format). I will upload new logs in the following hours. | |||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/May/15 ] | |||||||||||||||||||||||||||||||
|
Apologies for the long delay in getting back to you onyxmaster. We made some improvements to the log messages in Thanks, |