-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Storage
-
Fully Compatible
-
ALL
-
Repl 2018-02-26
-
0
Index builds on an existing collection are done in primarily two stages (correlated to the logging of stages 2 and 3). The first stage scans the collection, calculates each document's index key and inserts the results into the bulk builder. The bulk builder will sort all the keys and remember multikey information. The second stage is scanning through the bulk builder's output (which is in sorted order) and inserting into the storage engine's index builder. The index build is broken down this way as storage engine's may not be optimized for bulk loading randomly ordered data, but are optimized for bulk loads where elements are inserted in increasing order.
When building an index on an existing collection, setting multikey information is done between stages one and two. This write updates the catalog document for the collection but is not replicated. Thus it needs to explicitly be assigned a timestamp. However looking at the logical clock is error-prone in situations where lots of updates are being processed by the system and the stable timestamp is moving quickly.
Happily, the multikey update does not need to be done in that location and can be delayed until the index build completes; a write on a primary that is replicated. Having the timestamp assigned as part of a replicated operation does not carry the risk of assigning a stale value.
- is related to
-
SERVER-33503 Timestamp non-bulk multikey index commits
- Closed