[SERVER-65024] Multiple documents with the same _id value make reIndex invariant Created: 29/Mar/22  Updated: 29/Oct/23  Resolved: 01/Apr/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.2, 5.0.8, 6.0.0-rc0, 4.4.15

Type: Bug Priority: Major - P3
Reporter: Daniel Gomez Ferro Assignee: Yuhong Zhang
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-65086 Deprecate the `collection.reIndex()` ... Closed
is related to SERVER-65087 Audit the usage of foreground index b... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.3, v5.0, v4.4, v4.2
Sprint: Execution Team 2022-04-04, Execution Team 2022-04-18
Participants:

 Description   

In a cluster affected by WT-7995 and WT-8395, there two documents sharing the same _id value.

Our remediation process guide requires running reIndex to fix inconsistencies detected by validate. In master, when reIndex encounters duplicate _ids it invariants. The server ends up in a bad state because it tries to build the _id index during recovery but fails every time due to the duplicate values.

In 4.4 the behaviour is a bit different, reIndex adds the duplicate entries to the index and validate invariants when it encounters two duplicate values in a unique index.



 Comments   
Comment by Githook User [ 28/Apr/22 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'yuhong.zhang@mongodb.com', 'username': 'YuhongZhang98'}

Message: SERVER-65024 Fix the uniqueness constraint handling for foreground index builds

(cherry picked from commit 5985d757ddc2645fb1b5df88f78abf6b9a833452)
Branch: v4.4
https://github.com/mongodb/mongo/commit/2205c918e0dc4c5bd9b3e7151c42316792293aae

Comment by Githook User [ 19/Apr/22 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'yuhong.zhang@mongodb.com', 'username': 'YuhongZhang98'}

Message: SERVER-65024 Fix the uniqueness constraint handling for foreground index builds

(cherry picked from commit 5985d757ddc2645fb1b5df88f78abf6b9a833452)
Branch: v5.0
https://github.com/mongodb/mongo/commit/221a44efc37f2f88812275b153e6c8b823310880

Comment by Githook User [ 19/Apr/22 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'yuhong.zhang@mongodb.com', 'username': 'YuhongZhang98'}

Message: SERVER-65024 Fix the uniqueness constraint handling for foreground index builds

(cherry picked from commit 5985d757ddc2645fb1b5df88f78abf6b9a833452)
Branch: v5.3
https://github.com/mongodb/mongo/commit/9a75d576cffbc84581b2476127bf89fd214b7f0f

Comment by Githook User [ 01/Apr/22 ]

Author:

{'name': 'Yuhong Zhang', 'email': 'yuhong.zhang@mongodb.com', 'username': 'YuhongZhang98'}

Message: SERVER-65024 Fix the uniqueness constraint handling for foreground index builds
Branch: master
https://github.com/mongodb/mongo/commit/5985d757ddc2645fb1b5df88f78abf6b9a833452

Comment by Yuhong Zhang [ 30/Mar/22 ]

It turns out that our foreground index builds through IndexBuildsCoordinator::createIndex() are not checking any index constraints. I managed to reproduce the duplicate key case in a unit test:

diff --git a/src/mongo/db/catalog/validate_state_test.cpp b/src/mongo/db/catalog/validate_state_test.cpp
index 45868b0ad87..8de8adfdab0 100644
--- a/src/mongo/db/catalog/validate_state_test.cpp
+++ b/src/mongo/db/catalog/validate_state_test.cpp
@@ -42,12 +42,81 @@
 #include "mongo/db/repl/replication_coordinator.h"
 #include "mongo/db/storage/snapshot_manager.h"
 #include "mongo/db/storage/wiredtiger/wiredtiger_global_options.h"
+#include "mongo/logv2/log.h"
 #include "mongo/util/fail_point.h"
 
 namespace mongo {
 
 namespace {
 
+void printCollectionAndIndexTableEntries(OperationContext* opCtx, const NamespaceString& nss) {
+    invariant(!opCtx->lockState()->isLocked());
+    AutoGetCollection coll(opCtx, nss, MODE_IS);
+
+    LOGV2(51807, "Dumping collection table and index tables' entries for debugging...");
+
+    // Iterate and print the collection table (record store) documents.
+    RecordStore* rs = coll->getRecordStore();
+    auto rsCursor = rs->getCursor(opCtx);
+    boost::optional<Record> rec = rsCursor->next();
+    LOGV2(51808, "[Debugging] Collection table entries:");
+    while (rec) {
+        LOGV2(51809,
+              "[Debugging](record) {record_id}, Value: {record_data}",
+              "record_id"_attr = rec->id,
+              "record_data"_attr = rec->data.toBson());
+        rec = rsCursor->next();
+    }
+
+    // Iterate and print each index's table of documents.
+    const auto indexCatalog = coll->getIndexCatalog();
+    const auto it = indexCatalog->getIndexIterator(opCtx, /*includeUnfinished*/ false);
+    while (it->more()) {
+        const auto indexCatalogEntry = it->next();
+        const auto indexDescriptor = indexCatalogEntry->descriptor();
+        const auto iam = indexCatalogEntry->accessMethod()->asSortedData();
+        if (!iam) {
+            LOGV2(6325100,
+                  "[Debugging] skipping index {index_name} because it isn't SortedData",
+                  "index_name"_attr = indexDescriptor->indexName());
+            continue;
+        }
+        auto indexCursor = iam->newCursor(opCtx, /*forward*/ true);
+
+        const BSONObj& keyPattern = indexDescriptor->keyPattern();
+        const KeyString::Version version = iam->getSortedDataInterface()->getKeyStringVersion();
+        const auto ordering = Ordering::make(keyPattern);
+        KeyString::Builder firstKeyString(
+            version, BSONObj(), ordering, KeyString::Discriminator::kExclusiveBefore);
+
+        LOGV2(51810,
+              "[Debugging] {keyPattern_str} index table entries:",
+              "keyPattern_str"_attr = keyPattern);
+
+        for (auto keyStringEntry = indexCursor->seekForKeyString(firstKeyString.getValueCopy());
+             keyStringEntry;
+             keyStringEntry = indexCursor->nextKeyString()) {
+            auto keyString = KeyString::toBsonSafe(keyStringEntry->keyString.getBuffer(),
+                                                   keyStringEntry->keyString.getSize(),
+                                                   ordering,
+                                                   keyStringEntry->keyString.getTypeBits());
+            KeyString::logKeyString(keyStringEntry->loc,
+                                    keyStringEntry->keyString,
+                                    keyPattern,
+                                    keyString,
+                                    "[Debugging](index)");
+        }
+    }
+}
+
+void printValidateResults(const ValidateResults& results) {
+    BSONObjBuilder resultObj;
+
+    results.appendToResultObj(&resultObj, /*debugging=*/true);
+
+    LOGV2(51812, "Results", "results"_attr = resultObj.done());
+}
+
 const NamespaceString kNss = NamespaceString("fooDB.fooColl");
 
 class ValidateStateTest : public CatalogTestFixture {
@@ -86,8 +155,8 @@ void ValidateStateTest::createCollectionAndPopulateIt(OperationContext* opCtx,
     OpDebug* const nullOpDebug = nullptr;
     for (int i = 0; i < 10; i++) {
         WriteUnitOfWork wuow(opCtx);
-        ASSERT_OK(
-            collection->insertDocument(opCtx, InsertStatement(BSON("_id" << i)), nullOpDebug));
+        ASSERT_OK(collection->insertDocument(
+            opCtx, InsertStatement(BSON("_id" << i << "a" << 1)), nullOpDebug));
         wuow.commit();
     }
 }
@@ -123,8 +192,9 @@ void createIndex(OperationContext* opCtx, const NamespaceString& nss, const BSON
     ASSERT(collection);
 
     ASSERT_EQ(1, indexKey.nFields()) << nss << "/" << indexKey;
-    auto spec = BSON("v" << int(IndexDescriptor::kLatestIndexVersion) << "key" << indexKey << "name"
-                         << (indexKey.firstElementFieldNameStringData() + "_1"));
+    auto spec =
+        BSON("v" << int(IndexDescriptor::kLatestIndexVersion) << "key" << indexKey << "name"
+                 << (indexKey.firstElementFieldNameStringData() + "_1") << "unique" << true);
 
     auto indexBuildsCoord = IndexBuildsCoordinator::get(opCtx);
     auto indexConstraints = IndexBuildsManager::IndexConstraints::kEnforce;
@@ -203,6 +273,17 @@ TEST_F(ValidateStateTest, OpenCursorsOnAllIndexes) {
 
     // Create several indexes.
     createIndex(opCtx, kNss, BSON("a" << 1));
+    ValidateResults results;
+    BSONObjBuilder output;
+    ASSERT_OK(CollectionValidation::validate(opCtx,
+                                             kNss,
+                                             CollectionValidation::ValidateMode::kForegroundFull,
+                                             CollectionValidation::RepairMode::kNone,
+                                             &results,
+                                             &output,
+                                             true));
+    printValidateResults(results);
+    printCollectionAndIndexTableEntries(opCtx, kNss);
     createIndex(opCtx, kNss, BSON("b" << 1));
     createIndex(opCtx, kNss, BSON("c" << 1));
     createIndex(opCtx, kNss, BSON("d" << 1));

Comment by Louis Williams [ 30/Mar/22 ]

Thanks for investigating, yuhong.zhang. I believe we could have problems with other places where we use foreground builds in the code, like initial sync as well.

Comment by Yuhong Zhang [ 30/Mar/22 ]

The history and code paths of various index builds are admittedly quite hard to follow, so please correct me if I missed anything louis.williams daniel.gomezferro

I believe there are two related issues in the current code:

  1. We had the assumption that during bulk load we would never have a duplicate key in the _id index, which is true for almost all the time as it is built with the collection at the very beginning. However, when we run the reIndex command and there are duplicate _id values due to data inconsistencies, this assumption doesn't hold true anymore. But one can still argue that the assumption might be fine because we should handle duplicate keys before the call to WiredTigerIndex::IdBulkBuilder::addKey(), supposedly. But the check is actually skipped as dupsAllowed was set to true when initializing the index builds.
  2. When running hybrid index builds, we defer the check on constraints until commit time and check the interceptor. But reIndex command uses foreground index build, which doesn't install an interceptor, thus causing the check to be a noop. This is the reason why we would not fail to reIndex a unique index with duplicate values.

 

 

Comment by Yuhong Zhang [ 29/Mar/22 ]

On the master branch, reindexing a unique index will ignore duplicate keys and succeed instead of failing.

(function() {
"use strict";
 
const collName = "reindex_duplicate_keys";
const coll = db.getCollection(collName);
coll.drop();
 
// Bypasses DuplicateKey insertion error for testing via failpoint.
let addDuplicateDocumentsToCol = function(db, coll, doc) {
    jsTestLog("Inserts documents without index entries.");
    assert.commandWorked(
        db.adminCommand({configureFailPoint: "skipIndexNewRecords", mode: "alwaysOn"}));
 
    assert.commandWorked(coll.insert(doc));
    assert.commandWorked(coll.insert(doc));
 
    assert.commandWorked(db.adminCommand({configureFailPoint: "skipIndexNewRecords", mode: "off"}));
};
coll.createIndex({a: 1}, {unique: true})
addDuplicateDocumentsToCol(db, coll, {a: 1});
 
assert.commandWorked(coll.reIndex());
printjson(coll.getIndexes());
})();

Generated at Thu Feb 08 06:01:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.