[SERVER-32066] Inserting document to drop pending collection using UUID with applyOps can cause primary to dassert Created: 22/Nov/17  Updated: 27/Oct/23  Resolved: 13/Sep/19

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Evgeni Dobranov
Resolution: Gone away Votes: 0
Labels: rbfz
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File apply_ops_uuid_auth.js    
Issue Links:
Depends
Related
related to SERVER-32098 Disallow operations on drop-pending c... Closed
Operating System: ALL
Steps To Reproduce:

 
load("jstests/libs/check_log.js");  // For 'checkLog'.
function pauseOplogApplication(node) {
    assert.commandWorked(node.adminCommand(
        {configureFailPoint: "rsSyncApplyStop", mode: "alwaysOn"}));
    checkLog.contains(node, "rsSyncApplyStop fail point enabled");
}
 
function resumeOplogApplication(node) {
    assert.commandWorked(
        node.adminCommand({configureFailPoint: "rsSyncApplyStop", mode: "off"}));
}
 
 
let replTest = new ReplSetTest({name: "applyOpsTest", nodes: 2});
 
replTest.startSet();
replTest.initiate();
replTest.awaitReplication();
 
// Pause oplog application so collection drop doesn't commit.
pauseOplogApplication(replTest.getSecondary());
 
// Get connections and collection.
let primary = replTest.getPrimary();
let pdb = primary.getDB("test");
 
// Create collection.
pdb["coll"].insert({x:1});
 
let uuid = pdb.getCollectionInfos()[0].info.uuid;
 
// Drop and re-create collection.
pdb["coll"].drop();
pdb["coll"].insert({x:1});
 
let ops = [{
    "op": "i",
    "ns": "test.coll",
    "ui": uuid,
    "o": {"_id": 0}
}];
 
jsTestLog("Doing 'applyOps' command.");
assert.commandWorked(pdb.adminCommand({applyOps: ops, allowAtomic: false}));
 
resumeOplogApplication(replTest.getSecondary());

Sprint: Execution Team 2019-09-23
Participants:

 Description   

Consider the following sequence of collection operations that occur on a primary:

create "test.coll", UUID=0
drop   "test.coll", UUID=0
create "test.coll", UUID=1
insert "test.coll"  UUID=0, {x:1}

If we try to insert directly into a collection with namespace "test.coll" with UUID=0 using the applyOps command, before the drop of collection with UUID=0 has committed on primary, we hit this dassert.

[js_test:apply_ops_repro] 2017-11-27T11:45:16.568-0500 d50410| 2017-11-27T11:45:16.568-0500 F -        [conn1] Invariant failure opCtx->lockState()->isCollectionLockedForMode( requestNss.ns(), supportsDocLocking() ? MODE_IX : MODE_X) src/mongo/db/repl/oplog.cpp 1040

The problem seems to partially stem from the fact that we obtain locks by namespace, rather than UUID, before applying an operation through applyOps.

One solution may be to disallow inserts on collection UUIDs that are in drop-pending state. See repro script. Note that it is possible to repro the issue even without pausing secondary oplog application, since it is very likely that the offending insert operation will be executed before the previous collection drop committed.



 Comments   
Comment by Evgeni Dobranov [ 12/Sep/19 ]

This doesn't appear to be an issue anymore. We now fail the uassert here, which gives the expected behavior:

[jsTest] ----
[jsTest] Doing 'applyOps' command.
[jsTest] ----
assert: command failed: {
    "applied" : 1,
    "code" : 26,
    "codeName" : "NamespaceNotFound",
    "errmsg" : "Failed to apply operation due to missing collection
...

There was a change made in collections being deregistered earlier, which is likely the reason for failing the uassert prior to the dassert.

 

Comment by William Schultz (Inactive) [ 27/Nov/17 ]

As far I can tell, a user with the root role can still trigger this. apply_ops_uuid_auth.js

Comment by Spencer Brody (Inactive) [ 27/Nov/17 ]

This is a direct consequence of collection catalog versioning, assigning to storage to triage.

Comment by Spencer Brody (Inactive) [ 27/Nov/17 ]

This sounds like this may be a more general problem with any operation that accepts both a UUID and a namespace. We need some policy/technique to double check that the namespace/UUID mapping is correct and a plan for how to handle it when they are not

Comment by Spencer Brody (Inactive) [ 27/Nov/17 ]

This seems worrisome for more than the dassert. In production builds could this lead to modifying collection contents without a database lock?

Comment by Spencer Brody (Inactive) [ 27/Nov/17 ]

william.schultz, does the access control system prevent this?

Generated at Thu Feb 08 04:29:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.