Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12481

attempting to create a 10th index, with unique constraint violations corrupts db

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 2.4.10
    • Affects Version/s: 2.4.8
    • Component/s: Index Maintenance
    • None
    • ALL

      Issue Status as of March 27, 2014

      ISSUE SUMMARY
      If an index build attempt creates a tenth index on a collection (counted by including the default _id index), and such an index build fails (due to a uniqueness constraint violation, for example), the index catalog can become corrupted. This will cause all subsequent inserts to this collection to fail.

      USER IMPACT
      If an index build in the tenth index slot fails or is interrupted, it will render the index catalog for the collection corrupt. To fix the corruption, repair the database or resync the replica set node from a healthy node.

      SOLUTION
      Fixing the off-by-one error when trying to remove an index build in progress fixed the issue.

      WORKAROUNDS
      The most common scenario that triggers the bug is an index build failure due to a uniqueness constraint violation. In this situation, the corruption can be avoided if

      • the 10th index is not a unique index, or
      • there are no duplicate keys, or
      • the dropDups option is specified

      PATCHES
      The fix is included in the production release 2.4.10. Version 2.6 is unaffected by this issue.

      Original Description

      If the 10th index created on a collection has a unique constraint and there are duplicate key violations, the index build fails (as expected) but then hits an assertion and leaves the ns file corrupted so that all subsequent inserts fail. To reproduce:

      function repro() {
      
          db.dropDatabase()
      
          db.c.ensureIndex({a:1})
          db.c.ensureIndex({b:1})
          db.c.ensureIndex({c:1})
          db.c.ensureIndex({d:1})
          db.c.ensureIndex({e:1})
          db.c.ensureIndex({f:1})
          db.c.ensureIndex({g:1})
          db.c.ensureIndex({h:1})
          // now there are 9 indexes, including _id
      
          // create duplicate records with key i
          db.c.insert({i:0})
          db.c.insert({i:0})
      
          // create 10th index, unique constraint, with duplicate keys:
          printjson(db.c.ensureIndex({i:1}, {unique:true}))
          // fails with "Assertion: 14045:missing Extra"
          
          // and leaves ns file corrupted:
          db.c.insert({})
          // fails with "Assertion: 10295:getFile(): bad file number value (corrupt db?)"
      }
      

      The corruption does not occur if:

      • the unique index is not the 10th index, or
      • there are no duplicate keys, or
      • dropDups is specified

      When the conditions that trigger this issue are met, the catch block at pdfile.cpp:1552 calls IndexBuildsInProgress::remove, which has special logic to deal with the rollover from the 10 base indexes to the extra indexes, so the error may lie in this vicinity. For reference here is the stack trace for the initial assertion on creating the index:

          mongo::printStackTrace(std::ostream&)+0x21) [0xde46e1]
          mongo::msgasserted(int, char const*)+0x9b) [0xda5e1b]
          mongo::NamespaceDetails::idx(int, bool)+0x231) [0x8617a1]
          mongo::IndexBuildsInProgress::remove(char const*, int)+0x81) [0xab8181]
          mongo::insert_makeIndex(mongo::NamespaceDetails*, std::string const&, mongo::DiskLoc const&, bool)+0x96f) [0xac46ef]
          mongo::DataFileMgr::insert(char const*, void const*, int, bool, bool, bool, bool*)+0x7d2) [0xac8842]
          mongo::DataFileMgr::insertWithObjMod(char const*, mongo::BSONObj&, bool, bool)+0x4f) [0xaca5af]
          mongo::checkAndInsert(char const*, mongo::BSONObj&)+0x119) [0x9f8a69]
          mongo::receivedInsert(mongo::Message&, mongo::CurOp&)+0x929) [0x9f94d9]
          mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)+0xab8) [0x9ffd68]
          mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)+0x98) [0x6e8518]
          mongo::PortMessageServer::handleIncomingMsg(void*)+0x42e) [0xdd0cae]
      

            Assignee:
            milkie@mongodb.com Eric Milkie
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: