[SERVER-50697] Do not enforce fast count on 'config.system.indexBuilds' Created: 02/Sep/20 Updated: 29/Oct/23 Resolved: 11/Nov/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gregory Wlodarek | Assignee: | Gregory Wlodarek |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Execution Team 2020-11-16 | ||||
| Participants: | |||||
| Linked BF Score: | 70 | ||||
| Description |
|
This is related to a build failure on a two-node replica set where we build an index and then restart each node individually in standalone mode and encounter an incorrect fast count.
More precisely, this is the timeline of events leading up to the incorrect fast count: 1. The primary node starts an index build {x: 1} with UUID f50b6510-8a48-4143-b8f4-fc318d8cbd2a. 2. The secondary starts building the index too, and eventually, the commit quorum is satisfied. 3. Both nodes finish the index build and the commit timestamp for both the primary and secondary is Timestamp(1596309184, 9). 4. Both nodes are shutdown, as they will be started up in standalone mode. 5. The primary node is starting up in standalone mode with recoveryTimestamp Timestamp(1596309184, 8) and it sees that it has a single unfinished index in the 'config.system.indexBuilds' collection: "Found index from unfinished build". However, during shutdown, the collections are validated and the 'config.system.indexBuilds' fast count is incorrect. with the contents of the collection being:
From what I can tell, it looks like the fsync performed when shutting down the nodes in step 4 synced the fast count data "too early" to disk when step 3 removed the index entry from the 'config.system.indexBuilds' collection that was committed after the recoveryTimestamp. Restarting replica set members in standalone mode makes the database contents prone to appear inconsistent with the writes that were visible when this node was running as part of a replica set, so perhaps we can just turn off the fast count validation for the 'config.system.indexBuilds' in 'absent_ns_field_in_index_specs.js'. |
| Comments |
| Comment by Githook User [ 11/Nov/20 ] |
|
Author: {'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com', 'username': 'GWlodarek'}Message: |
| Comment by Gregory Wlodarek [ 11/Nov/20 ] |
|
After discussing with benety.goh we've agreed to exclude the 'config.system.indexBuilds' collection from validations fast count enforcement. This collection is internal to building indexes and should not be queried outside of that. |