[SERVER-31471] Certain applyOps and convertToCapped commands can cause secondaries to fassert Created: 09/Oct/17  Updated: 29/Jan/18  Resolved: 20/Nov/17

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: 3.5.13
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Xiangyu Yao (Inactive)
Resolution: Duplicate Votes: 0
Labels: rbfz
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File nsnotfound.js     Text File nsnotfound.log     Text File oplog.log    
Issue Links:
Duplicate
duplicates SERVER-30830 applyOps should return NamespaceNotFo... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

See nsnotfound.js

Sprint: Storage 2017-12-04
Participants:

 Description   

A certain sequence of applyOps and convertToCapped operations issued through the mongo shell can cause secondaries to fassert because an insert command with an invalid UUID is replicated.

The repro is attached and can be run through the mongo shell: ./mongo --nodb nsnotfound.js Most of the repro script is boiler plate code, the commands themselves are on lines 98 to 122.

The content of the primary's oplog is also attached.

[js_test:nsnotfound] 2017-10-09T16:26:27.089-0400 d20014| 2017-10-09T16:26:27.089-0400 I STORAGE  [repl writer worker 13] createCollection: vest.four with UUID: c8b153d4-0bc0-4b15-9d66-89bccfb51312
[js_test:nsnotfound] 2017-10-09T16:26:27.125-0400 d20014| 2017-10-09T16:26:27.125-0400 I COMMAND  [repl writer worker 15] CMD: dropIndexes vest.two
[js_test:nsnotfound] 2017-10-09T16:26:27.138-0400 d20014| 2017-10-09T16:26:27.138-0400 I COMMAND  [repl writer worker 1] CMD: dropIndexes vest.two
[js_test:nsnotfound] 2017-10-09T16:26:27.158-0400 d20014| 2017-10-09T16:26:27.157-0400 I STORAGE  [repl writer worker 5] createCollection: vest.tmpCYoIr.convertToCapped.four with UUID: c02c8409-e28b-460f-84b5-2a47ccf5478a
[js_test:nsnotfound] 2017-10-09T16:26:27.192-0400 d20014| 2017-10-09T16:26:27.192-0400 I STORAGE  [repl writer worker 7] dropCollection: vest.four - renaming to drop-pending collection: vest.system.drop.1507580787i8t1.four with drop optime { ts: Timestamp 1507580787000|8, t: 1 }
[js_test:nsnotfound] 2017-10-09T16:26:27.216-0400 d20014| 2017-10-09T16:26:27.215-0400 I REPL     [replication-2] Completing collection drop for vest.system.drop.1507580787i8t1.four with drop optime { ts: Timestamp 1507580787000|8, t: 1 } (notification optime: { ts: Timestamp 1507580787000|8, t: 1 })
[js_test:nsnotfound] 2017-10-09T16:26:27.240-0400 d20014| 2017-10-09T16:26:27.239-0400 I STORAGE  [repl writer worker 13] createCollection: test.three with UUID: 7623929c-e8ad-47cd-849d-895f6f57175e
[js_test:nsnotfound] 2017-10-09T16:26:27.275-0400 d20014| 2017-10-09T16:26:27.275-0400 I STORAGE  [repl writer worker 15] createCollection: vest.one with UUID: d3268af0-49a2-4818-a839-eafc20fbeb7a
[js_test:nsnotfound] 2017-10-09T16:26:27.316-0400 d20014| 2017-10-09T16:26:27.310-0400 F REPL     [repl writer worker 3] writer worker caught exception: NamespaceNotFound: Failed to apply operation due to missing collection (c8b153d4-0bc0-4b15-9d66-89bccfb51312): { ts: Timestamp 1507580787000|14, t: 1, h: -9013766507876488705, v: 2, op: "i", ns: "vest.four", ui: BinData(4, C8B153D40BC04B159D6689BCCFB51312), wall: new Date(1507580787298), o: { _id: "htH0MldB", d: "You must be the change you wish to see in the world.", a: { c: "Asking a working writer what he thinks about critics is like asking a lamppost how it feels about dogs." } } } on: { ts: Timestamp 1507580787000|14, t: 1, h: -9013766507876488705, v: 2, op: "i", ns: "vest.four", ui: BinData(4, C8B153D40BC04B159D6689BCCFB51312), wall: new Date(1507580787298), o: { _id: "htH0MldB", d: "You must be the change you wish to see in the world.", a: { c: "Asking a working writer what he thinks about critics is like asking a lamppost how it feels about dogs." } } }
[js_test:nsnotfound] 2017-10-09T16:26:27.316-0400 d20014| 2017-10-09T16:26:27.310-0400 F -        [repl writer worker 3] Fatal assertion 16359 NamespaceNotFound: Failed to apply operation due to missing collection (c8b153d4-0bc0-4b15-9d66-89bccfb51312): { ts: Timestamp 1507580787000|14, t: 1, h: -9013766507876488705, v: 2, op: "i", ns: "vest.four", ui: BinData(4, C8B153D40BC04B159D6689BCCFB51312), wall: new Date(1507580787298), o: { _id: "htH0MldB", d: "You must be the change you wish to see in the world.", a: { c: "Asking a working writer what he thinks about critics is like asking a lamppost how it feels about dogs." } } } at src/mongo/db/repl/sync_tail.cpp 1132
[js_test:nsnotfound] 2017-10-09T16:26:27.317-0400 d20014| 2017-10-09T16:26:27.310-0400 F -        [repl writer worker 3]
[js_test:nsnotfound] 2017-10-09T16:26:27.317-0400 d20014|
[js_test:nsnotfound] 2017-10-09T16:26:27.317-0400 d20014| ***aborting after fassert() failure



 Comments   
Comment by Robert Guo (Inactive) [ 20/Nov/17 ]

Indeed it has gone away. I confirmed that it repros on 3.5.13, but not rc0.

Based on the error message, it looks jstestfuzz caught this problem as well which was fixed in SERVER-30830.

Closing this ticket and marking it as a dupe of SERVER-30830.

Comment by Xiangyu Yao (Inactive) [ 18/Nov/17 ]

robert.guo I failed to reproduce this on my local machine. Could it be that the issue is gone?

Comment by Ian Whalen (Inactive) [ 25/Oct/17 ]

Bumping from 3.6 Required to 3.7 Desired as per input from Dan P.

Comment by Robert Guo (Inactive) [ 10/Oct/17 ]

Attached the full logs of this repro. it has the same fassert but with a different UUID. It also prints out the ops passed into applyOps. They're towards the end of the log and can be found by searching for "Printing the list of applyOps commands"

Generated at Thu Feb 08 04:27:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.