Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Cannot Reproduce
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: 2.6.5
Component/s: Replication
Labels:
None

Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

One of secondary members of our replica set had to be taken offline for a prolonged period. Unfortunately, the oplog on master was not long enough for a proper recovery. But, instead of getting the usual message in the error log (about oplog too short to recover), the secondary started creating a lot of connections to primary at a rate of about 300 per second, until it exhausted local ports (due to 30K of connections to the same remote port, hanging in TIME_WAIT) and then started losing heartbeats due to inability to connect to primary server at all, filling log with these messages:

2014-10-24T17:07:51.063+0400 [rsBackgroundSync] warning: Failed to connect to 10.3.1.12:27032, reason: errno:99 Cannot assign requested address
2014-10-24T17:07:51.064+0400 [rsBackgroundSync] repl: couldn't connect to server d1.s2.fs-temp.drive.bru:27032 (10.3.1.12), connection attempt failed

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

primary.log.bz2
42.27 MB
Oct 24 2014 02:44:03 PM UTC
secondary.log.bz2
310 kB
Oct 24 2014 02:44:03 PM UTC

Assignee:: Matt Dannenberg (Inactive)
Reporter:: Aristarkh Zagorodnikov
Participants:: Aristarkh Zagorodnikov, Matt Dannenberg, Ramon Fernandez
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Oct 24 2014 01:25:56 PM UTC
Updated:: Apr 14 2015 08:59:35 PM UTC
Resolved:: Apr 14 2015 08:59:35 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates