Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
- lowcontext
- sa-backlog

Assigned Teams:

Server Programmability
Operating System:
ALL
Steps To Reproduce:
Hide

This is reproducible by running the “too_stale_secondary.js” test after applying the following patch:

diff --git a/jstests/replsets/too_stale_secondary.js b/jstests/replsets/too_stale_secondary.js index 1ee6400ebc4..d8bc2c0f11e 100644 --- a/jstests/replsets/too_stale_secondary.js +++ b/jstests/replsets/too_stale_secondary.js @@ -93,8 +93,8 @@ replTest.initiate({ _id: testName, members: [ {_id: 0, host: nodes[0].host}, - {_id: 1, host: nodes[1].host, priority: 0}, - {_id: 2, host: nodes[2].host, priority: 0} + {_id: 1, host: nodes[1].host, priority: 1}, + {_id: 2, host: nodes[2].host, priority: 1} ] }); @@ -139,6 +139,17 @@ assert.soon(() => myState(replTest.nodes[2]) === ReplSetTest.State.RECOVERING, // This waits for the state as indicated by the primary node. replTest.waitForState(replTest.nodes[2], ReplSetTest.State.RECOVERING); +jsTestLog("Begin test"); +assert.commandWorked( + replTest.getPrimary().adminCommand({setParameter: 1, mirrorReads: {samplingRate: 1.0}})); + +for (var i = 0; i < 100; ++i) { + primaryTestDB.runCommand({find: collName, filter: {}}); +} +jsTestLog("Mid test"); +replTest.nodes[2].getDB('test').runCommand({find: 'test', filter: {}}); +jsTestLog("End test"); + jsTestLog("7: Stop and restart Node 2."); replTest.stop(2); replTest.restart(2, {
Show
This is reproducible by running the “too_stale_secondary.js” test after applying the following patch: diff --git a/jstests/replsets/too_stale_secondary.js b/jstests/replsets/too_stale_secondary.js index 1ee6400ebc4..d8bc2c0f11e 100644 --- a/jstests/replsets/too_stale_secondary.js +++ b/jstests/replsets/too_stale_secondary.js @@ -93,8 +93,8 @@ replTest.initiate({ _id: testName, members: [ {_id: 0, host: nodes[0].host}, - {_id: 1, host: nodes[1].host, priority: 0}, - {_id: 2, host: nodes[2].host, priority: 0} + {_id: 1, host: nodes[1].host, priority: 1}, + {_id: 2, host: nodes[2].host, priority: 1} ] }); @@ -139,6 +139,17 @@ assert.soon(() => myState(replTest.nodes[2]) === ReplSetTest.State.RECOVERING, // This waits for the state as indicated by the primary node. replTest.waitForState(replTest.nodes[2], ReplSetTest.State.RECOVERING); +jsTestLog("Begin test"); +assert.commandWorked( + replTest.getPrimary().adminCommand({setParameter: 1, mirrorReads: {samplingRate: 1.0}})); + +for (var i = 0; i < 100; ++i) { + primaryTestDB.runCommand({find: collName, filter: {}}); +} +jsTestLog("Mid test"); +replTest.nodes[2].getDB('test').runCommand({find: 'test', filter: {}}); +jsTestLog("End test"); + jsTestLog("7: Stop and restart Node 2."); replTest.stop(2); replTest.restart(2, {
Sprint:
Service Arch 2023-07-24, Service Arch 2023-08-07, Service Arch 2023-08-21, Service Arch 2023-09-04
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently, recovering nodes with a non-zero priority are considered electable (see here and here). This implies that nodes that are neither secondary nor primary will show up in the hosts section of hello responses.

Mirrored reads rely on the members in the hosts section to choose mirroring targets (here), and due to the aforementioned issue, it may mirror reads to non-secondary nodes (see ~~SERVER-60553~~). As part of this ticket, we should decide if:

Recovering nodes should not be considered electable and this is an issue with the implementation of hello command that we need to fix.
This is an issue with mirrored reads and we need to change the underlying mechanism that selects mirroring targets (e.g., using RSM).

Starting with the replication team to evaluate the first option. If this is not an issue with the implementation of hello, feel free to reassign to ServiceArch.

related to

SERVER-60553 Secondary replicaset initial sync errors with "NotWritablePrimary: Not-primary error while processing 'find' operation on 'database_production' database via fire-and-forget command execution."

Closed

SERVER-79329 Exclude lagged secondaries when selecting hosts for mirrored reads

Backlog

Assignee:: Unassigned
Reporter:: Amirsaman Memaripour
Participants:: Amirsaman Memaripour, Jason Chan
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Oct 12 2022 11:04:30 PM UTC
Updated:: Oct 23 2024 03:48:34 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates