[SERVER-39587] Include the final collection name in each oplog entry for commands using temporary collections Created: 14/Feb/19  Updated: 06/Dec/22  Resolved: 12/May/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Kaitlin Mahar Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-30371 Separate renameCollection across DB c... Closed
is related to SERVER-48165 Add tests to ensure temporary collect... Closed
Assigned Teams:
Storage Execution
Participants:

 Description   

cross-DB renameCollection, mapreduce, and aggregate with $out all write to temporary collections that are eventually renamed after all the data has been inserted. 

For example, as of SERVER-30371, a collection rename across databases generates these oplog entries: 

  • a create on the final database for a temporary collection name
  • insert s to the temporary collection for all the documents
  • a renameCollection from the temp collection to the final name
  • a drop for the original collection name 

Looking at the create and insert operations, you cannot tell what the collection's final name will be.

The oplog entries look similar for mapreduce and aggregation with $out.

It would be helpful to mongomirror if the oplog entries included the final namespace.

The reason is that MGOMIRROR-37 introduces the ability to only migrate/sync a subset of DBs and/or collections.

Looking at the create and insert oplog entries on the temporary collection, we cannot tell if the final namespace matches the user provided filter, and if we should be applying the ops or not.
 
For now, we can handle this by always applying entries for temporary collections on a database that is fully/partially included in the filter, and then dropping the temporary namespace if we get to the renameCollection op and it turns out the final name is not one we are mirroring. But that ends up being a lot of extra work for mongomirror. Additionally, it requires us to hard-code in what format temporary collection names are in, which seems risky if there is any chance those could change in future server versions. If the create and insert had a field indicating the final namespace, that would

a) tell us that the op is definitely on a temp collection (this would continue to work even if the temp name format changed), and

b) allow us to determine whether we care about the final namespace and should apply the op or skip it



 Comments   
Comment by Eric Milkie [ 12/May/20 ]

As Geert mentioned, we are replicating based on collection UUIDs now and the ns field is just informational. It sounds like we should switch MongoMirror to do replication filtering based on UUID, since those do not change on rename. This would solve the $out temp collection rename issue described above.

Comment by Geert Bosch [ 12/May/20 ]

Currently, the only notable property of temporary collections is that they are deleted on restart or replica-set failover. As it is our clear objective to make such events unnoticeable, so the entire concept of temporary collections will go away at some point.

Their main current use is as an approximation of atomically replacing a collection (using the aggregate command with $out, for example), but the isolation is fragile as it depends on cooperation from other readers and writers treating these collection names as special and hidden. Furthermore, there is nothing special about the rename done at the end of such an aggregate command. A user might well output to a collection sales.report and then rename that collection to sales.report2020Q2, so the proposed addition would not work in general. Or to reverse the logic: any solution that can deal with a user renaming an existing collection should also be able to deal with renaming temporary collections.

Since MongoDB 3.6, replication has been based on collection UUIDs rather than collection names, as it proved impossible to correctly sync nodes in the presence of concurrent renames due to idempotency issues. So, while each oplog entry currently still has a ns field with the collection name these are merely informational at this point and for backward compatibility with existing tools and possibly reading the oplog. Adding more special handling for temporary collections and collection names seems a step in the wrong direction.

Comment by David Golden [ 27/Apr/20 ]

FYI, kaitlin.mahar is not working on mongomirror anymore.  The current lead is ryan.chipman@mongodb.com.

I'd like to have this reconsidered.  In the original ticket description it says:

it requires us to hard-code in what format temporary collection names are in, which seems risky if there is any chance those could change in future server versions

We have tightly coupled behavior between the server and mongomirror based on a collection name format. If the decision is not to change the oplog per this ticket, does the server have tests to verify that the temporary collection name format remains unchanged? If not, it would be a good idea to add such tests.

Comment by Asya Kamsky [ 22/Mar/19 ]

Is this specific to cross database rename/output? Aggregation $out does not support specifying a different database at least up through 4.0 (though there's a plan to support that in the future).

Comment by Eric Milkie [ 21/Feb/19 ]

Also, what will you do if a multi-document transaction contains operations that are partially filtered?

Comment by Eric Milkie [ 21/Feb/19 ]

How are you planning on handling a renameCollection within the same database, if one collection is in the filter and one is out?

Comment by Kaitlin Mahar [ 15/Feb/19 ]

schwerin: Ah, I didn't realize those both use temporary collections - thanks for pointing that out! The same issue applies in those cases as well. Updating title/description accordingly.

Comment by Andy Schwerin [ 15/Feb/19 ]

Is this different from an aggregation or mapreduce that outputs to a temporary collection and then renames it? Is rename special?

Generated at Thu Feb 08 04:52:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.