Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Repl 2017-02-13, Repl 2017-04-17, Repl 2017-05-08
Linked BF Score:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In OplogFetcher::_callback(), we check to see if the query response actually has metadata, and set a boolean hasMetadata.
However, later on in the function, we blindly call shouldStopFetching() and pass the metadata, regardless of hasMetadata-ness.

This results in some erroneous behavior. In particular, you can get stuck in a tight CPU loop when chaining is turned off, because the node will choose what it believes to be a primary, but immediately afterwards, TopologyCoordinatorImpl::shouldChangeSyncSource() will return true, since the metadata config version (it's null) will not match the current config version. Repeat in a tight loop.

You can see the effects of this on the log here:
https://logkeeper.mongodb.org/build/7419231f517600f1d972ad9bc50cb45b/test/582a071abe07c472fe0b4a36
This tight loop appears to be part of the reason why the test suite failed (a replica set lost quorum due to slow heartbeats).

related to

SERVER-26528 Add additional logging when sync source is changed or cleared

Closed

Assignee:: Siyuan Zhou
Reporter:: Eric Milkie
Participants:: Eric Milkie, Judah Schvimer, Siyuan Zhou
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Nov 30 2016 09:03:43 PM UTC
Updated:: Feb 04 2018 07:08:44 AM UTC
Resolved:: Apr 18 2017 04:55:26 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates