[SERVER-68198] The `captrunc` test command doesn't atomically delete data and indexes Created: 21/Jul/22  Updated: 05/Dec/22  Resolved: 01/Nov/22

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution
Operating System: ALL
Sprint: Execution Team 2022-10-17, Execution Team 2022-10-31
Participants:

 Description   

The captrunc test command invokes cappedTruncateAfter without a WUOW. Based on the implementation of WiredTigerRecordStore::doCappedTruncateAfter, this means that first the indexes will be detached for the deleted records and later on, the actual data will get truncated in a separate WUOW.

This means if a crash happens in between the two, the index and data will remain inconsistent.

Without a crash of the server, there is not a problem because the whole command runs with a collection X-lock.



 Comments   
Comment by Matthew Saltz (Inactive) [ 03/Oct/22 ]

Jotting down some initial investigation

Uses of the command in jstests

Generic mentions of the command

These can be removed easily:

jstests/core/views/views_all_commands.js:    captrunc: {
jstests/core/views/views_all_commands.js:        command: {captrunc: "view", n: 2, inc: false},
jstests/libs/override_methods/network_error_and_txn_override.js:    "captrunc",
jstests/libs/override_methods/read_and_write_concern_helpers.js:    "captrunc",
jstests/noPassthroughWithMongod/testing_only_commands.js:    'captrunc',
jstests/replsets/all_commands_downgrading_to_upgraded.js:    captrunc: {
jstests/replsets/all_commands_downgrading_to_upgraded.js:        // command: {captrunc: "capped_truncate", n: 5, inc: false},
jstests/replsets/db_reads_while_recovering_all_commands.js:    captrunc: {skip: isPrimaryOnly},
jstests/replsets/tenant_migration_concurrent_writes_on_donor_util.js:    captrunc: {
jstests/replsets/tenant_migration_concurrent_writes_on_donor_util.js:            return {captrunc: collName, n: 1};
jstests/sharding/read_write_concern_defaults_application.js:    captrunc: {skip: "test command"},
jstests/sharding/safe_secondary_reads_drop_recreate.js:    captrunc: {skip: "primary only"},
jstests/sharding/safe_secondary_reads_single_migration_suspend_range_deletion.js:    captrunc: {skip: "primary only"},
jstests/sharding/safe_secondary_reads_single_migration_waitForDelete.js:    captrunc: {skip: "primary only"},

Explicit tests of the command itself

These should be easy to remove - the problem is that there seem to be a few actual usages of cappedTruncateAfter, which the captrunc command was used to test:

src/mongo/db/repl/replication_recovery.cpp:        // oplog's `cappedTruncateAfter` method was a convenient location for this logic, which,
src/mongo/db/repl/replication_recovery.cpp:    oplogCollection->getRecordStore()->cappedTruncateAfter(
src/mongo/db/repl/rs_rollback.cpp:                                            "cappedTruncateAfter",
src/mongo/db/repl/rs_rollback.cpp:                                                collection_internal::cappedTruncateAfter(
src/mongo/db/repl/rs_rollback.cpp:        oplogCollection->getRecordStore()->cappedTruncateAfter(

So we have to determine whether cappedTruncateAfter is sufficiently tested elsewhere in unit tests and if not what the difficulty would be in importing the jstests we have to be unit tests. These are the jstests:

jstests/noPassthroughWithMongod/capped6.js: * Tests Collection::cappedTruncateAfter() via the "captrunc" command. This is a test-only command
jstests/noPassthroughWithMongod/capped6.js: * 2. Remove all but one documents via one or more "captrunc" requests.
jstests/noPassthroughWithMongod/capped6.js:    // If n <= 0, no documents are removed by captrunc.
jstests/noPassthroughWithMongod/capped6.js:    // Number of times to call "captrunc" so that (count - 1) documents are removed
jstests/noPassthroughWithMongod/capped6.js:        assert.commandWorked(db.runCommand({captrunc: "capped6", n: n, inc: inc}));
jstests/noPassthroughWithMongod/capped_truncate.js: * Test running the 'captrunc' command on various kinds of collections:
jstests/noPassthroughWithMongod/capped_truncate.js:assert.commandFailed(db.runCommand({captrunc: "capped_truncate", n: -1}),
jstests/noPassthroughWithMongod/capped_truncate.js:                     "captrunc didn't return an error when attempting to remove a negative " +
jstests/noPassthroughWithMongod/capped_truncate.js:assert.commandFailed(db.runCommand({captrunc: "capped_truncate", n: 0}),
jstests/noPassthroughWithMongod/capped_truncate.js:                     "captrunc didn't return an error when attempting to remove 0 documents");
jstests/noPassthroughWithMongod/capped_truncate.js:assert.commandFailed(db.runCommand({captrunc: "capped_truncate", n: 20}),
jstests/noPassthroughWithMongod/capped_truncate.js:                     "captrunc didn't return an error when attempting to remove more" +
jstests/noPassthroughWithMongod/capped_truncate.js:assert.commandWorked(db.runCommand({captrunc: "capped_truncate", n: 5, inc: false}));
jstests/noPassthroughWithMongod/capped_truncate.js:// It is an error to run the captrunc command on a nonexistent collection.
jstests/noPassthroughWithMongod/capped_truncate.js:assert.commandFailed(db.runCommand({captrunc: "nonexistent", n: 1}),
jstests/noPassthroughWithMongod/capped_truncate.js:                     "captrunc didn't return an error for a nonexistent collection");
jstests/noPassthroughWithMongod/capped_truncate.js:// It is an error to run the captrunc command on a non-capped collection.
jstests/noPassthroughWithMongod/capped_truncate.js:assert.commandFailed(db.runCommand({captrunc: collName, n: 5}),
jstests/noPassthroughWithMongod/capped_truncate.js:                     "captrunc didn't return an error for a non-capped collection");
jstests/noPassthroughWithMongod/captrunc_cursor_invalidation.js:const coll = db.captrunc_cursor_invalidation;
jstests/noPassthroughWithMongod/captrunc_cursor_invalidation.js:assert.commandWorked(db.runCommand({captrunc: coll.getName(), n: 2}));

Tests using the command as a utility

The comments say:

 
We use the captrunc command as a catalog operation that requires a MODE_X lock on the
collection. This ensures we aren't having the dbHash command queue up behind it on a
database-level lock. The collection isn't capped so it'll fail with an
IllegalOperation error response.

So I assume its usage is replaceable.

jstests/replsets/dbhash_lock_acquisition.js:        // We use the captrunc command as a catalog operation that requires a MODE_X lock on the
jstests/replsets/dbhash_lock_acquisition.js:        assert.commandFailedWithCode(db.runCommand({captrunc: "mycoll", n: 1}),
jstests/replsets/dbhash_lock_acquisition.js:    const ops = db.currentOp({"command.captrunc": "mycoll", waitingForLock: true}).inprog;

Comment by Connie Chen [ 26/Jul/22 ]

We're in favor of deleting the captrunc test command, part of this ticket is to review potential fallout if we move forward with deleting it.

Generated at Thu Feb 08 06:10:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.