[SERVER-31543] rollback uses non-system temporary namespaces for working collections when handling rename conflicts Created: 13/Oct/17  Updated: 30/Oct/23  Resolved: 25/Oct/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.6.0-rc0
Fix Version/s: 3.6.0-rc2

Type: Bug Priority: Major - P3
Reporter: Robert Guo (Inactive) Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: rbfz
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File rollback_test_1507824735_000.js    
Issue Links:
Related
is related to SERVER-31478 Certain unrecoverable errors don't th... Closed
is related to SERVER-31488 dropping a drop-pending collection vi... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2017-11-13
Participants:

 Description   

When rolling back collection rename operations, the rollback algorithm may need to use temporary namespaces to resolve conflicts with existing collections. Currently, we use temporary <db>.system.tmpXXXXX namespaces for this purpose. This can cause issues for collections with validation enabled because the server internally disallows validation on system namespaces. The proposed fix is to use temporary namespaces in the <db>.rollback.tmpXXXXX namespace.

--------------

When rolling back a collMod on a collection coll1 that will later be renamed to coll2 can get stuck when coll2 is a temporary collection, which can happen when coll2 is in a drop-pending state.

Here's the log from the rollback process, which contains the 3 oplog entries that are rolled back.

d20003| 2017-10-12T21:12:47.280-0400 I ROLLBACK [rsBackgroundSync] Finding the Common Point
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] our last optime:   Oct 12 21:12:37:5
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] their last optime: Oct 12 21:12:41:2
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] diff in end of log times: -4 seconds
d20003| 2017-10-12T21:12:47.281-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507857157000|5, t: 1, h: -7181224202081172679, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, AA1EAED7602F4A26A40D5C8CDDFF522C), o: { renameCollection: "db_0.two", to: "db_0.three", stayTemp: false, dropTarget: false }, wall: new Date(1507857157681) }
d20003| 2017-10-12T21:12:47.281-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507857157000|4, t: 1, h: -444184082177170960, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, AA1EAED7602F4A26A40D5C8CDDFF522C), o: { collMod: "two", validator: { e: { $ne: 100.0 } }, validationLevel: "moderate", validationAction: "warn" }, o2: { collectionOptions_old: { uuid: BinData(4, AA1EAED7602F4A26A40D5C8CDDFF522C) } }, wall: new Date(1507857157680) }
d20003| 2017-10-12T21:12:47.281-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507857157000|3, t: 1, h: 7609240345702257709, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, C739BDEA66BF4A84887A717DAF27A6E5), o: { drop: "three" }, wall: new Date(1507857157679) }
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp 1507857157000|2, t: 1 }
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Starting refetching documents
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Finished refetching documents. Total size of documents refetched: 0
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Checking the RollbackID and updating the MinValid if necessary
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Setting minvalid to { ts: Timestamp 1507857161000|2, t: 2 }
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Dropping collections to roll back create operations
d20003| 2017-10-12T21:12:47.281-0400 I ROLLBACK [rsBackgroundSync] Rolling back renameCollection commands and collection drop commands.
d20003| 2017-10-12T21:12:47.281-0400 D ROLLBACK [rsBackgroundSync] Attempted to rename collection from db_0.system.drop.1507857157i3t1.three to db_0.three but db_0.three exists already. Temporarily renaming collection db_0.three out of the way to db_0.system.tmpUUNYo
d20003| 2017-10-12T21:12:47.310-0400 W ROLLBACK [rsBackgroundSync] Rollback cannot complete at this time (retrying later): InvalidOptions: Document validators not allowed on system collections. appliedThrough= { ts: Timestamp 1507857157000|5, t: 1 } minvalid= { ts: Timestamp 1507857161000|2, t: 2 }

This bug seems to also be the issue underlying SERVER-31478, where if you try to access a drop-pending collection, lookupNSSByUUID returns an empty string, causing the error shown in that ticket's description.

The repro has two lines commented out (lines 49 and 50). If those lines are added back in, we get the error message from SERVER-31478. Here's stack trace with oplog entries for reference.

d20003| 2017-10-12T21:02:49.455-0400 I REPL     [rsBackgroundSync] transition to ROLLBACK
d20003| 2017-10-12T21:02:49.455-0400 I ROLLBACK [rsBackgroundSync] Starting rollback. Sync source: robert-macbook-pro.local:20001
d20003| 2017-10-12T21:02:49.457-0400 I ROLLBACK [rsBackgroundSync] Finding the Common Point
d20003| 2017-10-12T21:02:49.457-0400 I ROLLBACK [rsBackgroundSync] our last optime:   Oct 12 21:02:29:7
d20003| 2017-10-12T21:02:49.457-0400 I ROLLBACK [rsBackgroundSync] their last optime: Oct 12 21:02:33:2
d20003| 2017-10-12T21:02:49.457-0400 I ROLLBACK [rsBackgroundSync] diff in end of log times: -4 seconds
d20003| 2017-10-12T21:02:49.457-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507856549000|7, t: 1, h: 3861500656895800221, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, 57BC97DB6F264A5A891590A2D2E3EFCE), o: { renameCollection: "db_0.two", to: "db_0.three", stayTemp: false, dropTarget: false }, wall: new Date(1507856549782) }
d20003| 2017-10-12T21:02:49.458-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507856549000|6, t: 1, h: 2189257796218325053, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, CDD4727C308942C392482D85F70AD907), o: { drop: "three" }, wall: new Date(1507856549781) }
d20003| 2017-10-12T21:02:49.458-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507856549000|5, t: 1, h: 6563969530498729032, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, CDD4727C308942C392482D85F70AD907), o: { create: "three", flags: 3, validationLevel: "strict", validationAction: "warn", idIndex: { v: 2, key: { _id: 1 }, name: "_id_", ns: "db_0.three" } }, wall: new Date(1507856549781) }
d20003| 2017-10-12T21:02:49.458-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507856549000|4, t: 1, h: -2445838164785118562, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, 57BC97DB6F264A5A891590A2D2E3EFCE), o: { collMod: "two", validator: { e: { $ne: 100.0 } }, validationLevel: "moderate", validationAction: "warn" }, o2: { collectionOptions_old: { uuid: BinData(4, 57BC97DB6F264A5A891590A2D2E3EFCE) } }, wall: new Date(1507856549780) }
d20003| 2017-10-12T21:02:49.458-0400 D ROLLBACK [rsBackgroundSync] Updating rollback FixUpInfo for local oplog entry: { ts: Timestamp 1507856549000|3, t: 1, h: 2320841867216137252, v: 2, op: "c", ns: "db_0.$cmd", ui: BinData(4, 9E0EC528AE3F414DAD8D4294A253D63A), o: { drop: "three" }, wall: new Date(1507856549779) }
d20003| 2017-10-12T21:02:49.458-0400 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp 1507856549000|2, t: 1 }
d20003| 2017-10-12T21:02:49.458-0400 I ROLLBACK [rsBackgroundSync] Starting refetching documents
d20003| 2017-10-12T21:02:49.458-0400 I ROLLBACK [rsBackgroundSync] Finished refetching documents. Total size of documents refetched: 0
d20003| 2017-10-12T21:02:49.458-0400 I ROLLBACK [rsBackgroundSync] Checking the RollbackID and updating the MinValid if necessary
d20003| 2017-10-12T21:02:49.458-0400 I ROLLBACK [rsBackgroundSync] Setting minvalid to { ts: Timestamp 1507856553000|2, t: 2 }
d20003| 2017-10-12T21:02:49.458-0400 I ROLLBACK [rsBackgroundSync] Dropping collections to roll back create operations
d20003| 2017-10-12T21:02:49.458-0400 E -        [rsBackgroundSync] Assertion: 28539:need a valid database name src/mongo/db/concurrency/d_concurrency.cpp 198



 Comments   
Comment by William Schultz (Inactive) [ 15/Nov/17 ]

robert.guo Are there things that need to be re-enabled in the rollback fuzzer since this ticket has been resolved?

Comment by Githook User [ 25/Oct/17 ]

Author:

{'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}

Message: SERVER-31543 rollback uses non-system temporary namespaces for working collections when handling rename conflicts
Branch: master
https://github.com/mongodb/mongo/commit/a6c3184ab7638c734b5736b5ccea4e47d5ae80a5

Comment by Githook User [ 25/Oct/17 ]

Author:

{'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}

Message: SERVER-31543 extract test function _testRollbackRenamingCollectionsToEachOther
Branch: master
https://github.com/mongodb/mongo/commit/bc63217d7a78f02ee3fe9210fda0ba3e829a0787

Comment by Githook User [ 25/Oct/17 ]

Author:

{'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}

Message: SERVER-31543 add collection name and UUID to 'Document validators not allowed' error messages
Branch: master
https://github.com/mongodb/mongo/commit/5b9b9a9f04b06109b77b5522f7318c366deecf6f

Generated at Thu Feb 08 04:27:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.