[SERVER-12481] attempting to create a 10th index, with unique constraint violations corrupts db Created: 25/Jan/14  Updated: 11/Jul/16  Resolved: 04/Feb/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.8
Fix Version/s: 2.4.10

Type: Bug Priority: Critical - P2
Reporter: Bruce Lucas (Inactive) Assignee: Eric Milkie
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-12484 IndexRebuilder assertion at startup Closed
is duplicated by SERVER-13299 Can't create 2dsphere index Closed
Related
Operating System: ALL
Participants:

 Description   
Issue Status as of March 27, 2014

ISSUE SUMMARY
If an index build attempt creates a tenth index on a collection (counted by including the default _id index), and such an index build fails (due to a uniqueness constraint violation, for example), the index catalog can become corrupted. This will cause all subsequent inserts to this collection to fail.

USER IMPACT
If an index build in the tenth index slot fails or is interrupted, it will render the index catalog for the collection corrupt. To fix the corruption, repair the database or resync the replica set node from a healthy node.

SOLUTION
Fixing the off-by-one error when trying to remove an index build in progress fixed the issue.

WORKAROUNDS
The most common scenario that triggers the bug is an index build failure due to a uniqueness constraint violation. In this situation, the corruption can be avoided if

  • the 10th index is not a unique index, or
  • there are no duplicate keys, or
  • the dropDups option is specified

PATCHES
The fix is included in the production release 2.4.10. Version 2.6 is unaffected by this issue.

Original Description

If the 10th index created on a collection has a unique constraint and there are duplicate key violations, the index build fails (as expected) but then hits an assertion and leaves the ns file corrupted so that all subsequent inserts fail. To reproduce:

function repro() {
 
    db.dropDatabase()
 
    db.c.ensureIndex({a:1})
    db.c.ensureIndex({b:1})
    db.c.ensureIndex({c:1})
    db.c.ensureIndex({d:1})
    db.c.ensureIndex({e:1})
    db.c.ensureIndex({f:1})
    db.c.ensureIndex({g:1})
    db.c.ensureIndex({h:1})
    // now there are 9 indexes, including _id
 
    // create duplicate records with key i
    db.c.insert({i:0})
    db.c.insert({i:0})
 
    // create 10th index, unique constraint, with duplicate keys:
    printjson(db.c.ensureIndex({i:1}, {unique:true}))
    // fails with "Assertion: 14045:missing Extra"
    
    // and leaves ns file corrupted:
    db.c.insert({})
    // fails with "Assertion: 10295:getFile(): bad file number value (corrupt db?)"
}

The corruption does not occur if:

  • the unique index is not the 10th index, or
  • there are no duplicate keys, or
  • dropDups is specified

When the conditions that trigger this issue are met, the catch block at pdfile.cpp:1552 calls IndexBuildsInProgress::remove, which has special logic to deal with the rollover from the 10 base indexes to the extra indexes, so the error may lie in this vicinity. For reference here is the stack trace for the initial assertion on creating the index:

    mongo::printStackTrace(std::ostream&)+0x21) [0xde46e1]
    mongo::msgasserted(int, char const*)+0x9b) [0xda5e1b]
    mongo::NamespaceDetails::idx(int, bool)+0x231) [0x8617a1]
    mongo::IndexBuildsInProgress::remove(char const*, int)+0x81) [0xab8181]
    mongo::insert_makeIndex(mongo::NamespaceDetails*, std::string const&, mongo::DiskLoc const&, bool)+0x96f) [0xac46ef]
    mongo::DataFileMgr::insert(char const*, void const*, int, bool, bool, bool, bool*)+0x7d2) [0xac8842]
    mongo::DataFileMgr::insertWithObjMod(char const*, mongo::BSONObj&, bool, bool)+0x4f) [0xaca5af]
    mongo::checkAndInsert(char const*, mongo::BSONObj&)+0x119) [0x9f8a69]
    mongo::receivedInsert(mongo::Message&, mongo::CurOp&)+0x929) [0x9f94d9]
    mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)+0xab8) [0x9ffd68]
    mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)+0x98) [0x6e8518]
    mongo::PortMessageServer::handleIncomingMsg(void*)+0x42e) [0xdd0cae]



 Comments   
Comment by Githook User [ 04/Feb/14 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-12481 fix off-by-one error with IndexBuildsInProgress::remove
Branch: v2.4
https://github.com/mongodb/mongo/commit/419bc91c2cdf2e10f3e7a1754ac92a1e6c69d964

Comment by Daniel Pasette (Inactive) [ 26/Jan/14 ]

This issue affects 2.4 only.

Generated at Thu Feb 08 03:28:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.