-
Type:
Bug
-
Resolution: Done
-
Priority:
Major - P3
-
None
-
Affects Version/s: 3.0.5
-
Component/s: Replication
-
None
-
ALL
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
we have a replset with three data node: A(master) B(slave) C(slave) in LAN, no network problem, the replset chain is: A -> B -> C. when B is under heavy read load, C sync oplog from B failed with error "socket exception RECV_TIMEOUT", but B sync from A is normal, so node B's data is update to date. after some time we find that C lag from other a lot. then we use "rs.syncFrom(A)" to let C sync oplog from A, after a few second C is update with master.
I expect that when C sync oplog from B failed, or failed some times, will switch to master node automaticly even if B is update to date. we will disable chained replication in replset config before this problem is solved.
some info:
1. db version: 3.0.5
2. chainingAllowed: true(default)
3. SlaveReplicationInfo
repl1:SECONDARY> db.printSlaveReplicationInfo() source: NODE_B:2706 syncedTo: Wed Dec 09 2015 10:43:22 GMT+0800 (CST) -1 secs (0 hrs) behind the primary source: NODE_C:2706 syncedTo: Wed Dec 09 2015 10:34:35 GMT+0800 (CST) 526 secs (0.15 hrs) behind the primary
4. some sync log
2015-12-09T10:29:01.877+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:29:01.905+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:29:01.946+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:29:01.950+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:34:42.673+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:34:42.696+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:34:42.717+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:34:42.718+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:34:44.641+0800 I REPL [rsBackgroundSync] replSet syncTail condition 1 2015-12-09T10:34:45.656+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:35:45.853+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:35:45.853+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:35:45.853+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:35:45.853+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:36:16.721+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:36:16.721+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:36:16.721+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:36:16.721+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:37:16.933+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:37:16.933+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:37:16.933+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:37:16.933+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:37:47.189+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:37:47.189+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:37:47.189+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:37:47.189+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:37:49.456+0800 I REPL [rsBackgroundSync] replSet syncTail condition 1 2015-12-09T10:37:50.457+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:43:44.201+0800 I NETWORK [rsBackgroundSync] Socket recv() timeout NODE_B:2706 2015-12-09T10:43:44.201+0800 I NETWORK [rsBackgroundSync] SocketException: remote: NODE_B:2706 error: 9001 socket exception [RECV_TIMEOUT] server [NODE_B:2706] 2015-12-09T10:43:44.201+0800 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: NODE_B:2706 2015-12-09T10:43:44.201+0800 I REPL [ReplicationExecutor] syncing from: NODE_B:2706 2015-12-09T10:44:07.570+0800 I REPL [ReplicationExecutor] syncing from: NODE_A:2706 by request 2015-12-09T10:44:07.580+0800 I REPL [SyncSourceFeedback] replset setting syncSourceFeedback to NODE_A:2706