Priority: Major - P3
Affects Version/s: None
Sprint:Storage Engines 2018-12-17, Storage Engines 2018-12-31, Storage Engines 2019-01-14, Storage Engines 2019-01-28, Storage Engines 2019-02-11, Storage Engines 2019-02-25, Storage Engines 2019-04-22
See the background section below for some elaboration on what MongoDB does.
Until projects complete that coordinate the existence of background index builds between primaries and secondaries, MongoDB can perform an update to a document with a potentially greater time than the following update to the same document. This is an unexpected use-case of WT timestamps and perhaps warrants some testing to ensure WT's handling of this situation meets MongoDB's expectations.
Specifically this test should have concurrent writers modifying documents mostly in timestamp order, with an occasional update that's guaranteed to choose a timestamp greater than the previous update, but may be smaller than the following update.
Consider an expected update chain for a document:
And in our chaos monkey scenario:
The test should assert that all readers coming in after the delete can never see the update at 200. They should only be able to see:
- Reading >= 100 -> the deleted version
- 50 <= Reading < 100 -> the update at 50
- 25 <= Reading < 50 -> the insert at 25
- Reading < 25 -> nothing
Additionally this test should freely advance the oldest and stable timestamps.
MongoDB at times has to fabricate a timestamp for an update where it has no knowledge of the timestamp of the previous update to that document. Without having the option of always choosing a "correct" time, it can opt to choose a time that's always "too early" or potentially "too late". Note, a "correct" time will be at least as large as the previous update's timestamp, and never larger than the next update's timestamp. Because these updates happen on secondaries, MongoDB does not have the privilege of controlling the timestamp of future updates to the document.
Consider the update chain when choosing a time too early:
Head -> Update @ 100 -> Update @ 0 (always early) -> Update @ 50 -> Insert @ 25
And the update chain for choosing a time too late:
Head -> Update @ 100 -> Update @ 5000 (too late) -> Update @ 50 -> Insert @ 25
Now consider performing a read at time 10. In the first, too early, case, the update with the fabricated time will be (incorrectly) returned, whereas in the latter case no document will be returned. In other words, the latter algorithm implies the document with a fabricated time will never become visible without one of its neighbors being visible, an acceptable outcome.