[SERVER-55397] Index build restart ident drops are not timestamped during startup recovery Created: 22/Mar/21 Updated: 29/Oct/23 Resolved: 05/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.9.0 |
| Fix Version/s: | 5.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Gregory Noma |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Execution Team 2021-04-19, Execution Team 2021-05-03, Execution Team 2021-05-17 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 137 | ||||||||||||||||||||||||
| Description |
|
This assertion added in I think we need to either:
|
| Comments |
| Comment by Githook User [ 05/May/21 ] |
|
Author: {'name': 'Gregory Noma', 'email': 'gregory.noma@gmail.com', 'username': 'gregorynoma'}Message: |
| Comment by Louis Williams [ 06/Apr/21 ] |
|
Update: It turns out that the reason the index is dropped and restarted at all is that the rollback fuzzer disables commitQuorum for index builds. As a result, the index build does not qualify as "resumable", requiring the ident to be dropped and recreated at startup. Since index builds are resumable from every phase in 4.4, both of these bug conditions only exist when the server is configured with enableIndexBuildCommitQuorum=false. As far as I know, this is not a parameter that we document. |
| Comment by Louis Williams [ 05/Apr/21 ] |
|
Considering this, I think there is probably a different bug, as follows:
If this were just a single startup (i.e. just step 1) this would not be possible, because we set collection's minimumVisibleSnapshot to the stable timestamp when the catalog entry is not visible at the oldest timestamp. But in this scenario, the catalog entry would be visible at both the stable timestamp and the oldest timestamp. As a result, we don't set the minimumVisible, allowing reads to observe a potentially incorrect historical state of the catalog entry. |
| Comment by Daniel Gottlieb (Inactive) [ 02/Apr/21 ] |
To clarify, I was claiming that performing a write to an _mdb_catalog entry (for a replicated collection) without a timestamp can corrupt the update chain. Prior to durable history, a case like this would be "okay" so long as we haven't performed any replication recovery (meaning the 0 timestamped write is only covering up the version as of the stable checkpoint MDB is starting up against). As you demonstrated, that single WUOW just changes one index ident value for another. With durable history, we might be okay (modulo WT bugs coping with the complexity of this MDB requirement), but that would true only if we never read the _mdb_catalog at an arbitrarily early value (presumably between the oldest and stable timestamps). |
| Comment by Louis Williams [ 02/Apr/21 ] |
|
daniel.gottlieb, when we do the drop of the index entry, this happens inside of one WUOW, so there is never a state without an index entry. I think we could try to pass a larger timestamp to the timestamp monitor so that the old ident is not dropped until the next checkpoint. |
| Comment by Daniel Gottlieb (Inactive) [ 02/Apr/21 ] |
|
louis.williams, does "dropping the ident without a timestamp" mean removing it from the _mdb_catalog entry for that collection? For a replicated collection I would assume that to be a data corrupting write. Am I missing something? edit As per what gregory.wlodarek found in |