[SERVER-21806] Chained Replication problem when middle secondary under heavy read load Created: 09/Dec/15 Updated: 11/Jan/16 Resolved: 11/Jan/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | ma6174 | Assignee: | Kelsey Schubert |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
we have a replset with three data node: A(master) B(slave) C(slave) in LAN, no network problem, the replset chain is: A -> B -> C. when B is under heavy read load, C sync oplog from B failed with error "socket exception RECV_TIMEOUT", but B sync from A is normal, so node B's data is update to date. after some time we find that C lag from other a lot. then we use "rs.syncFrom(A)" to let C sync oplog from A, after a few second C is update with master. I expect that when C sync oplog from B failed, or failed some times, will switch to master node automaticly even if B is update to date. we will disable chained replication in replset config before this problem is solved. some info: 1. db version: 3.0.5
4. some sync log
|
| Comments |
| Comment by Kelsey Schubert [ 11/Jan/16 ] |
|
Hi ma6174, Socket exception RECV_TIMEOUT likely indicates inadequate system resources for your specified configuration settings. A replica set member will choose a new sync source if any of the following happen:
In your replica set, C is consistently able to open new connections to B, and B does not lag significantly behind A. Therefore, C continues to sync from B. For MongoDB-related support discussion please post on the mongodb-users group or Stack Overflow with the mongodb tag. Questions about improving how your system handles heavy load would be best posted on the mongodb-users group. Kind regards, |
| Comment by Kelsey Schubert [ 23/Dec/15 ] |
|
Hi ma6174, Sorry for delay publicly responding. So we can continue to investigate, please upload the following:
Thank you, |