-
Type:
Bug
-
Resolution: Cannot Reproduce
-
Priority:
Minor - P4
-
None
-
Affects Version/s: 2.6.5
-
Component/s: Replication
-
None
-
ALL
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
One of secondary members of our replica set had to be taken offline for a prolonged period. Unfortunately, the oplog on master was not long enough for a proper recovery. But, instead of getting the usual message in the error log (about oplog too short to recover), the secondary started creating a lot of connections to primary at a rate of about 300 per second, until it exhausted local ports (due to 30K of connections to the same remote port, hanging in TIME_WAIT) and then started losing heartbeats due to inability to connect to primary server at all, filling log with these messages:
2014-10-24T17:07:51.063+0400 [rsBackgroundSync] warning: Failed to connect to 10.3.1.12:27032, reason: errno:99 Cannot assign requested address
2014-10-24T17:07:51.064+0400 [rsBackgroundSync] repl: couldn't connect to server d1.s2.fs-temp.drive.bru:27032 (10.3.1.12), connection attempt failed