Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70510

Avoid considering recovering nodes as electable

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • Service Arch
    • ALL
    • Hide

      This is reproducible by running the “too_stale_secondary.js” test after applying the following patch:

      diff --git a/jstests/replsets/too_stale_secondary.js b/jstests/replsets/too_stale_secondary.js
      index 1ee6400ebc4..d8bc2c0f11e 100644
      --- a/jstests/replsets/too_stale_secondary.js
      +++ b/jstests/replsets/too_stale_secondary.js
      @@ -93,8 +93,8 @@ replTest.initiate({
           _id: testName,
           members: [
               {_id: 0, host: nodes[0].host},
      -        {_id: 1, host: nodes[1].host, priority: 0},
      -        {_id: 2, host: nodes[2].host, priority: 0}
      +        {_id: 1, host: nodes[1].host, priority: 1},
      +        {_id: 2, host: nodes[2].host, priority: 1}
           ]
       });
       
      @@ -139,6 +139,17 @@ assert.soon(() => myState(replTest.nodes[2]) === ReplSetTest.State.RECOVERING,
       // This waits for the state as indicated by the primary node.
       replTest.waitForState(replTest.nodes[2], ReplSetTest.State.RECOVERING);
       
      +jsTestLog("Begin test");
      +assert.commandWorked(
      +    replTest.getPrimary().adminCommand({setParameter: 1, mirrorReads: {samplingRate: 1.0}}));
      +
      +for (var i = 0; i < 100; ++i) {
      +    primaryTestDB.runCommand({find: collName, filter: {}});
      +}
      +jsTestLog("Mid test");
      +replTest.nodes[2].getDB('test').runCommand({find: 'test', filter: {}});
      +jsTestLog("End test");
      +
       jsTestLog("7: Stop and restart Node 2.");
       replTest.stop(2);
       replTest.restart(2, {
      
      Show
      This is reproducible by running the “too_stale_secondary.js” test after applying the following patch: diff --git a/jstests/replsets/too_stale_secondary.js b/jstests/replsets/too_stale_secondary.js index 1ee6400ebc4..d8bc2c0f11e 100644 --- a/jstests/replsets/too_stale_secondary.js +++ b/jstests/replsets/too_stale_secondary.js @@ -93,8 +93,8 @@ replTest.initiate({ _id: testName, members: [ {_id: 0, host: nodes[0].host}, - {_id: 1, host: nodes[1].host, priority: 0}, - {_id: 2, host: nodes[2].host, priority: 0} + {_id: 1, host: nodes[1].host, priority: 1}, + {_id: 2, host: nodes[2].host, priority: 1} ] }); @@ -139,6 +139,17 @@ assert.soon(() => myState(replTest.nodes[2]) === ReplSetTest.State.RECOVERING, // This waits for the state as indicated by the primary node. replTest.waitForState(replTest.nodes[2], ReplSetTest.State.RECOVERING); +jsTestLog("Begin test"); +assert.commandWorked( + replTest.getPrimary().adminCommand({setParameter: 1, mirrorReads: {samplingRate: 1.0}})); + +for (var i = 0; i < 100; ++i) { + primaryTestDB.runCommand({find: collName, filter: {}}); +} +jsTestLog("Mid test"); +replTest.nodes[2].getDB('test').runCommand({find: 'test', filter: {}}); +jsTestLog("End test"); + jsTestLog("7: Stop and restart Node 2."); replTest.stop(2); replTest.restart(2, {
    • Service Arch 2023-07-24, Service Arch 2023-08-07, Service Arch 2023-08-21, Service Arch 2023-09-04

      Currently, recovering nodes with a non-zero priority are considered electable (see here and here). This implies that nodes that are neither secondary nor primary will show up in the hosts section of hello responses.

      Mirrored reads rely on the members in the hosts section to choose mirroring targets (here), and due to the aforementioned issue, it may mirror reads to non-secondary nodes (see SERVER-60553). As part of this ticket, we should decide if:

      • Recovering nodes should not be considered electable and this is an issue with the implementation of hello command that we need to fix.
      • This is an issue with mirrored reads and we need to change the underlying mechanism that selects mirroring targets (e.g., using RSM).

      Starting with the replication team to evaluate the first option. If this is not an issue with the implementation of hello, feel free to reassign to ServiceArch.

            Assignee:
            backlog-server-servicearch Backlog - Service Architecture
            Reporter:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: