[SERVER-67924] 4.2 secondary crashes on fassert when replicating a deprecated collMod command option originating from a 4.0 primary Created: 08/Jul/22  Updated: 12/Jul/22  Resolved: 12/Jul/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.21
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Edwin Zhou Assignee: Edwin Zhou
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Operating System: ALL
Steps To Reproduce:

This issue can be reproduced by running

db.getSiblingDB("test").getCollection("test_col").insert({a:1})
db.getSiblingDB("test").runCommand({ collMod: "test_col", usePowerOf2Sizes: true });

on a 4.0 primary with one 4.2 secondary.

Participants:

 Description   

During the upgrade path from 4.0 to 4.2, applications that use the 4.0 accepted command

db.getSiblingDB("test").runCommand({ collMod: "test_col", usePowerOf2Sizes: true });

will crash any 4.2 secondaries that attempt to replicate this command.

2022-07-08T12:34:52.641-0400 E  REPL     [repl-writer-worker-0] Failed command { collMod: "test_col", usePowerOf2Sizes: true } on test with status InvalidOptions: unknown option to collMod: usePowerOf2Sizes during oplog application
2022-07-08T12:34:52.660-0400 F  REPL     [repl-writer-worker-0] Error applying operation ({ op: "c", ns: "test.$cmd", ui: UUID("9cdb54d2-a0a7-4f68-9d94-ac0e6622b191"), o: { collMod: "test_col", usePowerOf2Sizes: true }, o2: { collectionOptions_old: { uuid: UUID("9cdb54d2-a0a7-4f68-9d94-ac0e6622b191"), flags: 1 } }, ts: Timestamp(1657298092, 1), t: 1, h: -8911072701486057981, v: 2, wall: new Date(1657298092618) }):  :: caused by :: InvalidOptions: unknown option to collMod: usePowerOf2Sizes
2022-07-08T12:34:52.660-0400 F  REPL     [rsSync-0] Failed to apply batch of operations. Number of operations in batch: 1. First operation: { op: "c", ns: "test.$cmd", ui: UUID("9cdb54d2-a0a7-4f68-9d94-ac0e6622b191"), o: { collMod: "test_col", usePowerOf2Sizes: true }, o2: { collectionOptions_old: { uuid: UUID("9cdb54d2-a0a7-4f68-9d94-ac0e6622b191"), flags: 1 } }, ts: Timestamp(1657298092, 1), t: 1, h: -8911072701486057981, v: 2, wall: new Date(1657298092618) }. Last operation: { op: "c", ns: "test.$cmd", ui: UUID("9cdb54d2-a0a7-4f68-9d94-ac0e6622b191"), o: { collMod: "test_col", usePowerOf2Sizes: true }, o2: { collectionOptions_old: { uuid: UUID("9cdb54d2-a0a7-4f68-9d94-ac0e6622b191"), flags: 1 } }, ts: Timestamp(1657298092, 1), t: 1, h: -8911072701486057981, v: 2, wall: new Date(1657298092618) }. Oplog application failed in writer thread 4: InvalidOptions: unknown option to collMod: usePowerOf2Sizes
2022-07-08T12:34:52.660-0400 F  -        [rsSync-0] Fatal assertion 34437 InvalidOptions: unknown option to collMod: usePowerOf2Sizes at src/mongo/db/repl/sync_tail.cpp 851
2022-07-08T12:34:52.660-0400 F  -        [rsSync-0] \n\n***aborting after fassert() failure\n\n

This can lead to an undesirable upgrade experience that can cause 4.2 secondary nodes to suddenly crash while following the upgrade procedure.

MongoDB should be able to handle this failed batch replication better either by ignoring it and/or logging that this collMod option is no longer supported.



 Comments   
Comment by Edwin Zhou [ 12/Jul/22 ]

As 4.0 is already EOL, and any solution will be non-trivial since it will require significant changes to the way we apply collMod operations on 4.2, we are closing this ticket as "Won't Fix".

Users can recover from this by removing any invocations of collMod commands on 4.0 nodes which use MMAPv1 options that were deprecated since 3.0, waiting a bit, and then retrying the upgrade.

We will be updating our documentation to warn users about deprecated collMod options which can be tracked in DOCS-15477.

Generated at Thu Feb 08 06:09:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.