[SERVER-39562] Repair should handle duplicate unique index keys Created: 13/Feb/19  Updated: 29/Oct/23  Resolved: 16/Jul/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.7.0, 4.4.2

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Fausto Leyva (Inactive)
Resolution: Fixed Votes: 0
Labels: execution_intern, intern_validate_improvements, neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-40175 Rebuild any missing _id indexes at st... Closed
is related to SERVER-49507 Reduce memory consumption in startup ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4, v4.2
Sprint: Execution Team 2020-07-13, Execution Team 2020-07-27
Participants:

 Description   

Starting mongod with --repair first salvages collections in WiredTiger and then rebuilds all indexes. The salvage operation can recover multiple versions of the same document, so it's possible for a collection to end up with duplicate _id keys. This causes the repair operation to fail, and it becomes impossible to start MongoDB to recover any data.

If an index rebuild fails, we should drop the index and continue. This can be paired with a warning message on startup if an _id index is missing on a collection.

An alternative would be to provide a parameter "repairSkipIndexRebuild" that skips rebuilding all indexes. This easily has the potential to result in having corrupt indexes even if repair succeeds, whereas removing the index and forcing the user to rebuild prevents that problem entirely.



 Comments   
Comment by Githook User [ 08/Oct/20 ]

Author:

{'name': 'Faustoleyva54', 'email': 'fausto.leyva@mongodb.com', 'username': 'Faustoleyva54'}

Message: SERVER-39562 Repair should handle duplicate unique index keys

(cherry picked from commit 80f11e6ae0708e8c8da49208ef2cf71cdd06877c)

SERVER-49507 Reduce memory consumption in startup repair when rebuilding unique indexes with a large number of duplicate records

(cherry picked from commit e25d43ca2b5e99e6484cb0e13ca5f9e2d014ac30)
Branch: v4.4
https://github.com/mongodb/mongo/commit/69b7023414b5a1160de0f0e4a068e4bd8ff288e9

Comment by Githook User [ 16/Jul/20 ]

Author:

{'name': 'Faustoleyva54', 'email': 'fausto.leyva@mongodb.com', 'username': 'Faustoleyva54'}

Message: SERVER-39562 Repair should handle duplicate unique index keys

Removed MongoDFixture from disk_wiredtiger suite.
Branch: master
https://github.com/mongodb/mongo/commit/80f11e6ae0708e8c8da49208ef2cf71cdd06877c

Comment by Fausto Leyva (Inactive) [ 16/Jul/20 ]

https://mongodbcr.appspot.com/613350013/#ps627180006

Comment by Louis Williams [ 18/Feb/19 ]

Yet another idea that accomplishes the same goal is to just delete documents with duplicate keys and write them to a lost+found file on disk. That way there is no possibility for starting up without an _id index.

Generated at Thu Feb 08 04:52:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.