[SERVER-32206] Catalog change to declare an index as multikey must be timestamped. Created: 07/Dec/17 Updated: 30/Oct/23 Resolved: 02/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 3.7.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | rollback-functional | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-01-15, Repl 2018-01-29, Repl 2018-02-12 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
There are two cases to resolve. The first is easier. The second is a little trickier: When a document is inserted/updated into a collection that can require multiple entries in an index (e.g: the value of an indexed field is an array), the index's "multikey" field must be set to true. This update is currently done as a side-transaction to avoid write conflicts. Being a side-transaction throws away the ability to inherit a timestamp from the insert/update request. Proposed solutions can be classified as:
|
| Comments |
| Comment by Githook User [ 02/Feb/18 ] | ||||||
|
Author: {'email': 'judah@mongodb.com', 'name': 'Judah Schvimer', 'username': 'judahschvimer'}Message: | ||||||
| Comment by Judah Schvimer [ 11/Jan/18 ] | ||||||
|
Upon discussion with daniel.gottlieb, we will remove the side transaction. On a primary, we can simply assign this write the same timestamp as the index creation, insert, or update that caused this index to become multikey. This is because if two operations concurrently try to change the index to be multikey, they will conflict and the loser will simply get a higher timestamp and go into the oplog second with a later optime. On a secondary, writes must get the timestamp of their oplog entry, and the multikey change must occur at the timestamp of the earliest write that makes the index multikey. Secondaries only serialize writes by document, not by collection. If two inserts/updates that both make the index multikey are applied out of order, changing the index to multikey at the insert timestamps would change the index to multikey at the latter timestamp, which would be wrong. Index creations are applied serially with CRUD ops, so multikey index commits cannot conflict. To prevent this we can do one of two things: | ||||||
| Comment by Daniel Gottlieb (Inactive) [ 08/Dec/17 ] | ||||||
|
For completeness, one edge case is out of scope for solving as recover to a timestamp is designed to solve this problem. I believe secondaries applying operations in parallel can result in the following sequence. Consider two operations, one at Timestamp 1000 and the other at Timestamp 2000 that both require setting multikey to true:
Then suppose the secondary recovers to timestamp 1500 followed by rolling forward the oplog[1]. Assuming the multikey write is only causally related to timestamp 2000, the catalog may incorrectly believe multikey to be false despite Insert A still existing. It is a goal of the recover to a stable timestamp is project to preserve this information. Specifically, it will save all multikey true values (and their MultikeyPaths) and restore those to indexes that still exist at recovery time. To not lose information in the face of a crash, that data can be made durable before recovering to the stable timestamp. [1] On second thought, this wouldn't be a problem today. Storage can only recover to a "stable timestamp" and secondaries only submit oplog batch boundaries as candidates for becoming stable. I believe this prevents the scenario from incorrectly losing this information. However, reading at a timestamp inside the batch could return a false value despite still being able to observe write A. I'm not sure if those poses a problem for point in time reads on secondaries milkie | ||||||
| Comment by James Wahlin [ 07/Dec/17 ] | ||||||
|
As discussed, we write multikey path information to the database as part of the the MetaData, which is a timestamped write. | ||||||
| Comment by Eric Milkie [ 07/Dec/17 ] | ||||||
|
I don't think MultiKeyPaths is written to the database – it's only populated in memory. | ||||||
| Comment by James Wahlin [ 07/Dec/17 ] | ||||||
|
Do we need to timestamp updates to MultikeyPaths as well or are they updated in a different manner? |