[SERVER-39596] While a node is not in primary/secondary state, dbStats/collStats should not hang Created: 15/Feb/19  Updated: 06/May/20  Resolved: 06/May/20

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Linda Qin Assignee: Gregory Wlodarek
Resolution: Duplicate Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-13315 Investigate changes in SERVER-39596: ... Closed
Duplicate
is duplicated by SERVER-45610 Some reads work while system is RECOV... Closed
Problem/Incident
Related
Backport Requested:
v4.2, v4.0, v3.6
Sprint: Execution Team 2019-12-16, Execution Team 2020-02-10, Execution Team 2019-12-30, Execution Team 2020-05-18
Participants:
Case:
Linked BF Score: 47

 Description   

Currently when a node is in initial sync (STARTUP2), running a query on a collection (except collections in local database) returns error "NotMasterOrSecondary":

> db.docs.find()
Error: error: {
    "operationTime" : Timestamp(0, 0),
    "ok" : 0,
    "errmsg" : "node is not in primary or recovering state",
    "code" : 13436,
    "codeName" : "NotMasterOrSecondary",
    "$clusterTime" : {
        "clusterTime" : Timestamp(1550198882, 1),
        "signature" : {
            "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
            "keyId" : NumberLong(0)
        }
    }
}

However, when we run the dbStats or collStats commands (on the collection that is syncing), the commands will just hang there, waiting for locks. Since the node is in STARTUP2 state (not ready for reads), will we consider just returning an error "NotMasterOrSecondary" for dbStats/collStats/listDatabases/etc commands (same as the {find}} command)?



 Comments   
Comment by Gregory Wlodarek [ 06/May/20 ]

I'm closing this ticket as a duplicate of SERVER-45610 as that ticket implemented the idea we were planning on moving forward with here. SERVER-45610 prevents the dbStats/collStats command from running on nodes that are in a recovering state.

I'm going to request that we consider backporting SERVER-45610 to v3.6, v4.0, and v4.2.

Comment by Githook User [ 28/Jan/20 ]

Author:

{'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek', 'name': 'Gregory Wlodarek'}

Message: Revert "SERVER-39596 While a node is not in primary/secondary state, dbStats/collStats should not hang"

This reverts commit 0e079cef6ba967a3cc930c6fb7960a9125a387ad.

delete mode 100644 jstests/replsets/initial_sync_does_not_block_commands.js
Branch: master
https://github.com/mongodb/mongo/commit/aea2622937550adae02f2e374398dca8cd5003dd

Comment by Githook User [ 28/Jan/20 ]

Author:

{'email': 'gregory.wlodarek@mongodb.com', 'name': 'Gregory Wlodarek', 'username': 'GWlodarek'}

Message: Revert "SERVER-39596 Blacklist initial_sync_does_not_block_commands.js on ephemeral storage engine."

This reverts commit 8f3a768c61a9ca3c46739c5639584e738774666b.
Branch: master
https://github.com/mongodb/mongo/commit/4fe19f8ab5388e61928fcf961502a45213445dd0

Comment by Githook User [ 26/Dec/19 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-39596 Blacklist initial_sync_does_not_block_commands.js on ephemeral storage engine.
Branch: master
https://github.com/mongodb/mongo/commit/8f3a768c61a9ca3c46739c5639584e738774666b

Comment by Githook User [ 20/Dec/19 ]

Author:

{'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}

Message: SERVER-39596 While a node is not in primary/secondary state, dbStats/collStats should not hang
Branch: master
https://github.com/mongodb/mongo/commit/0e079cef6ba967a3cc930c6fb7960a9125a387ad

Comment by Gregory Wlodarek [ 20/Dec/19 ]

We've managed to make these commands not hang instead of disallowing them to run on secondaries during the initial sync phase. There was an internal locking issue which continued to hold onto an exclusive collection lock throughout the entire collection cloning phase.

Generated at Thu Feb 08 04:52:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.