[SERVER-39587] Include the final collection name in each oplog entry for commands using temporary collections Created: 14/Feb/19 Updated: 06/Dec/22 Resolved: 12/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaitlin Mahar | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
cross-DB renameCollection, mapreduce, and aggregate with $out all write to temporary collections that are eventually renamed after all the data has been inserted. For example, as of
Looking at the create and insert operations, you cannot tell what the collection's final name will be. The oplog entries look similar for mapreduce and aggregation with $out. It would be helpful to mongomirror if the oplog entries included the final namespace. The reason is that MGOMIRROR-37 introduces the ability to only migrate/sync a subset of DBs and/or collections. Looking at the create and insert oplog entries on the temporary collection, we cannot tell if the final namespace matches the user provided filter, and if we should be applying the ops or not. a) tell us that the op is definitely on a temp collection (this would continue to work even if the temp name format changed), and b) allow us to determine whether we care about the final namespace and should apply the op or skip it |
| Comments |
| Comment by Eric Milkie [ 12/May/20 ] |
|
As Geert mentioned, we are replicating based on collection UUIDs now and the ns field is just informational. It sounds like we should switch MongoMirror to do replication filtering based on UUID, since those do not change on rename. This would solve the $out temp collection rename issue described above. |
| Comment by Geert Bosch [ 12/May/20 ] |
|
Currently, the only notable property of temporary collections is that they are deleted on restart or replica-set failover. As it is our clear objective to make such events unnoticeable, so the entire concept of temporary collections will go away at some point. Their main current use is as an approximation of atomically replacing a collection (using the aggregate command with $out, for example), but the isolation is fragile as it depends on cooperation from other readers and writers treating these collection names as special and hidden. Furthermore, there is nothing special about the rename done at the end of such an aggregate command. A user might well output to a collection sales.report and then rename that collection to sales.report2020Q2, so the proposed addition would not work in general. Or to reverse the logic: any solution that can deal with a user renaming an existing collection should also be able to deal with renaming temporary collections. Since MongoDB 3.6, replication has been based on collection UUIDs rather than collection names, as it proved impossible to correctly sync nodes in the presence of concurrent renames due to idempotency issues. So, while each oplog entry currently still has a ns field with the collection name these are merely informational at this point and for backward compatibility with existing tools and possibly reading the oplog. Adding more special handling for temporary collections and collection names seems a step in the wrong direction. |
| Comment by David Golden [ 27/Apr/20 ] |
|
FYI, kaitlin.mahar is not working on mongomirror anymore. The current lead is ryan.chipman@mongodb.com. I'd like to have this reconsidered. In the original ticket description it says:
We have tightly coupled behavior between the server and mongomirror based on a collection name format. If the decision is not to change the oplog per this ticket, does the server have tests to verify that the temporary collection name format remains unchanged? If not, it would be a good idea to add such tests. |
| Comment by Asya Kamsky [ 22/Mar/19 ] |
|
Is this specific to cross database rename/output? Aggregation $out does not support specifying a different database at least up through 4.0 (though there's a plan to support that in the future). |
| Comment by Eric Milkie [ 21/Feb/19 ] |
|
Also, what will you do if a multi-document transaction contains operations that are partially filtered? |
| Comment by Eric Milkie [ 21/Feb/19 ] |
|
How are you planning on handling a renameCollection within the same database, if one collection is in the filter and one is out? |
| Comment by Kaitlin Mahar [ 15/Feb/19 ] |
|
schwerin: Ah, I didn't realize those both use temporary collections - thanks for pointing that out! The same issue applies in those cases as well. Updating title/description accordingly. |
| Comment by Andy Schwerin [ 15/Feb/19 ] |
|
Is this different from an aggregation or mapreduce that outputs to a temporary collection and then renames it? Is rename special? |