[SERVER-41965] Change repair to only rebuild indexes on repaired collections Created: 27/Jun/19  Updated: 29/Oct/23  Resolved: 30/Jan/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Daniel Ernst
Resolution: Fixed Votes: 0
Labels: execution_intern_2019
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Documented
is documented by DOCS-13378 Investigate changes in SERVER-41965: ... Closed
Related
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2019-12-16, Execution Team 2019-12-30, Execution Team 2020-01-13, Execution Team 2020-01-27, Execution Team 2020-02-10, Execution Team 2019-12-30
Participants:
Case:

 Description   

At present, --repair will unconditionally rebuild all indexes. There should be a parameter that disables this behavior and only rebuilds indexes on collections that have been salvaged and may have modified data or collections with indexes that fail validation.

Since index builds are very slow, this would be helpful for very large installations that only need to repair a single collection out of many.



 Comments   
Comment by Githook User [ 30/Jan/20 ]

Author:

{'name': 'Daniel Ernst', 'email': 'daniel.ernst@mongodb.com'}

Message: SERVER-41965 Change repair to only rebuild indexes on necessary collections
Branch: master
https://github.com/mongodb/mongo/commit/a6d3529b264b8b2331faea6a0e645fcf9def8f7f

Comment by Eric Milkie [ 28/Oct/19 ]

Need to confirm how repair works today and if we can make this more automatic, while still avoiding rebuilding indexes unnecessary.

Comment by Dianna Hohensee (Inactive) [ 13/Aug/19 ]

To contribute another factor in this discussion in case it becomes relevant: simultaneous index builds will need behavior such that we default to not rebuilding in-progress index builds, unless corruption is found, when we know we're a replica set member.

Comment by Louis Williams [ 13/Aug/19 ]

milkie, I agree. I think we should have --repair perform "validate" for each collection, and then only rebuild corrupt indexes or indexes on salvaged collections. I think it would still be valuable to provide a parameter to only run validate+index rebuilding on salvaged collections (i.e. don't unconditionally run validate). Since validate can be expensive (though not so much as index building), it may be helpful to have it skip verified collections when all you want to do is recreate a deleted .wt collection file.

Comment by Eric Milkie [ 09/Aug/19 ]

I guess I don't understand the use case for this feature then. Can you direct repair to only repair one collection, or does it always scan the data records for all collections? If the latter, I think we would have to change it so that instead of rebuilding all indexes, it would instead validate all indexes (and then only rebuild indexes that were salvaged or were for a collection that was salvaged).
Ideally, one could tell repair exactly what to scan for problems, but that's a different ticket.

Comment by Louis Williams [ 08/Aug/19 ]

milkie if we choose to not rebuild indexes by default, we would need to consider running the "validate" command instead, because --repair would no longer be able to guarantee index consistency. I think an option to disable this behavior by default would be the safest approach.

Comment by Eric Milkie [ 05/Aug/19 ]

Is there a reason why we wouldn't make this new behavior the default? With the elimination of mmap I'm not familiar with reasons why we need the ability to rebuild all indexes at repair time.

Generated at Thu Feb 08 04:59:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.