[SERVER-9351] 3 node replica set fresh config - failure after initial mongoimport Created: 15/Apr/13 Updated: 16/Apr/13 Resolved: 16/Apr/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | David Sobon | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 12.04.2 LTS 3.2.0-40-virtual, 64-bit, hosted on AWS EC2 |
||
| Operating System: | ALL |
| Steps To Reproduce: | 1) create and initiate clean 3-node replica set cluster.
|
| Participants: |
| Description |
|
After setting up replication as per architecture design pattern "Geographically Distributed Sets" (2 nodes in one AZ, 1 node in another AZ, via VPN, as per Amazon recommended design), performing a fresh import on NODE 1 (client) to NODE 1 (server) triggers replication issues. NODE 1 - primary, AZ2 (availability zone) PROBLEM . mongo client on {NODE 3} responds very slowly (up to 30 seconds lag), even on enter with no command.Error logs: ------------- Mon Apr 15 08:02:55.026 [rsBackgroundSync] Socket recv() timeout {NODE 1} Mon Apr 15 08:02:55.026 [rsBackgroundSync] SocketException: remote: {NODE 1} error: 9001 socket exception [3] server [{NODE 1}] Mon Apr 15 08:02:55.026 [rsBackgroundSync] replSet db exception in producer: 10278 dbclient error communicating with server: {NODE 1} Mon Apr 15 08:02:56.050 [rsSyncNotifier] Socket recv() timeout {NODE 1} Mon Apr 15 08:02:56.050 [rsSyncNotifier] SocketException: remote: {NODE 1} error: 9001 socket exception [3] server [{NODE 1}] Mon Apr 15 08:02:56.050 [rsSyncNotifier] DBClientCursor::init call() failed Mon Apr 15 08:02:57.050 [rsSyncNotifier] replset tracking exception: exception: 9001 socket exception [FAILED_STATE] for {NODE 1} Mon Apr 15 08:02:58.051 [rsSyncNotifier] replset setting oplog notifier to {NODE 1} replication status --------------------{NODE 1} state - PRIMARY, optime - 1366013200 {NODE 2} state - SECONDARY, optime - 1366013200{NODE 3} state - SECONDARY, optime - 1366012945 |
| Comments |
| Comment by David Sobon [ 16/Apr/13 ] |
|
Please mark problem as INVALID. Issue ended up being the cross-availability-zone VPN connection, the TCP connections did not have TCP MSS set properly. The solution was on both ends of the VPN link: iptables -I FORWARD -p tcp --syn -s {saddr}/24 -d {daddr}/24 -j TCPMSS --set-mss 1356 |