-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Replication
-
ALL
-
-
Repl 2025-10-13, Repl 2025-10-27
-
None
-
None
-
None
-
None
-
None
-
None
-
None
In the past few months we've had various AFs (ex: AF-1462) because a customer is runnign killOp() commands and killing internal replication operations, which lead to a crash of the mongod process.
We recently tried to improve logging when this happens (see: SERVER-101858), but it doesn't fix the issue as it catches the exception and the does a fassert().
When reproducing this bug in a test we get logs like this:
[js_test:killOp_against_repl_threads] d20041| {"t":{"$date":"2025-10-09T23:32:04.950+00:00"},"s":"I", "c":"COMMAND", "id":558700, "ctx":"conn1","msg":"Successful killOp","attr":{"remote":"127.0.0.1:60526","metadata":{"application":{"name":"MongoDB Shell"},"driver":{"name":"MongoDB Internal Client","version":"8.3.0-alpha0"},"os":{"type":"Linux","name":"Ubuntu","architecture":"aarch64","version":"22.04"}},"db":"admin","command":{"killOp":1,"op":38914,"lsid":{"id":{"$uuid":"ee5c83c6-52b8-4f0e-9ef4-5b212507a834"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1760052722,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$readPreference":{"mode":"secondaryPreferred"},"$db":"admin"}}} ... [js_test:killOp_against_repl_threads] d20041| {"t":{"$date":"2025-10-09T23:32:05.004+00:00"},"s":"I", "c":"REPL", "id":10185800,"ctx":"OplogWriter-0","msg":"OplogWriter threw a DBException","attr":{"what":"operation was interrupted","exception":"Interrupted: operation was interrupted"}} [js_test:killOp_against_repl_threads] d20041| {"t":{"$date":"2025-10-09T23:32:05.004+00:00"},"s":"F", "c":"ASSERT", "id":23089, "ctx":"OplogWriter-0","msg":"Fatal assertion","attr":{"msgid":10185801,"location":"src/mongo/db/repl/oplog_writer.cpp:62:31:auto mongo::repl::OplogWriter::startup()::(anonymous class)::operator()(const executor::TaskExecutor::CallbackArgs &)"}} [js_test:killOp_against_repl_threads] d20041| {"t":{"$date":"2025-10-09T23:32:05.004+00:00"},"s":"F", "c":"ASSERT", "id":23090, "ctx":"OplogWriter-0","msg":"\n\n***aborting after fassert() failure\n\n"} [js_test:killOp_against_repl_threads] d20041| {"t":{"$date":"2025-10-09T23:32:05.004+00:00"},"s":"F", "c":"CONTROL", "id":6384300, "ctx":"OplogWriter-0","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
Our current documentation states the user should not try to kill internal DB operations, but this is not foolproof.
I suggest we do one of the following options to prevent this crash from happening:
- [ Preferred ] Prevent the killOp() command from killing internal Repl operations.
- Only allow users with internal privilege (on top of killop) to kill an internal operation.
The first option is preferred since there is no valid reason for a user or an operator to kill an internal Repl operation that I could find.
- is related to
-
SERVER-101858 OplogWriterImpl does not handle interruptions gracefully
-
- Closed
-