[SERVER-51593] Index build bulk builder does not handle WriteConflictExceptions Created: 14/Oct/20  Updated: 08/May/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-40820 Jstestfuzz suite to introduce spuriou... Closed
Related
is related to SERVER-49507 Reduce memory consumption in startup ... Closed
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Case:

 Description   

There is a region in index builds with no WriteConflictException handling. This can cause exceptions to escape, fail the index build, and crash with the following exception:

Invariant failure {"expr":"status.isA<ErrorCategory::Interruption>() || status.isA<ErrorCategory::ShutdownError>()","msg":"Unexpected error code during index build cleanup: WriteConflict: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.","file":"src/mongo/db/index_builds_coordinator.cpp","line":2475}

This exception can be propagated from here while inserting keys from the external sorter into the bulk loader.

The nature of the WriteConflict at this point cannot be a logical write-write conflict, because the index builder has exclusive access to the table at this point. It would have to be caused by WiredTiger forcing old transactions to roll-back in the event of cache pressure.



 Comments   
Comment by Edwin Zhou [ 28/Oct/20 ]

I spent a bit more time on this problem. It appears that the cause of the WCE being thrown is when attempting to create a unique index on a field with many duplicate values (in this test case, half of of the docs). Here is the repro script that crashes on r4.4.1

Workload

  • 3 node replica set
  • 256MB cache size on each node
  • 500000 1KB documents (250000 of them have duplicates in their field)
  • Create unique index on field with duplicates.

/**
 * Attempts to throw WriteConflict error by enforcing uniqueness on an index on field with many duplicate values. 
 * @tags: [requires_replication, requires_wiredtiger]
 */
(function () {
  "use strict";
 
  const rst = new ReplSetTest({
    nodes: [
      {
        wiredTigerCacheSizeGB: 0.25,
      },
      {
        wiredTigerCacheSizeGB: 0.25,
      },
      {
        wiredTigerCacheSizeGB: 0.25,
      },
    ],
  });
  const nodes = rst.startSet()
  rst.initiate();
 
  const primary = rst.getPrimary();
  const mydb = primary.getDB("test");
  const coll = mydb.getCollection("t");
 
  const numDocs = 500000;
  const numDocsPercent = numDocs/10
 
  let bulk = coll.initializeUnorderedBulkOp();
  for (let i = 0; i < numDocs; ++i) {
    if (i > numDocs / 2) {
      if (i % numDocsPercent === 0) {
        jsTestLog(`Queueing duplicate ${i} documents`);
      }
      bulk.insert({
        _id: i,
        counter: "counter",
        name: `${i}_item`,
        filler: "x".repeat(1024),
      });
    } else {
      if (i % numDocsPercent === 0) {
        jsTestLog(`Queueing ${i} documents`);
      }
      bulk.insert({
        _id: i,
        counter: i,
        name: `${i}_item`,
        filler: "x".repeat(1024),
      });
    }
  }
  assert.commandWorked(bulk.execute());
  jsTestLog(`inserted ${numDocs} documents`);
  coll.save({ _id: numDocs, counter: numDocs - 1, name: `${numDocs}_item` });
  jsTestLog(`inserted duplicate field document`);
 
  assert.eq(numDocs + 1, coll.find().itcount());
  rst.awaitReplication();
 
  const awaitIndexBuild_1 = startParallelShell(() => {
    const mydb = db.getSiblingDB("test");
    const coll = mydb.getCollection("t");
    assert.commandFailedWithCode(
      mydb.runCommand({
        createIndexes: coll.getName(),
        indexes: [
          {
            key: {
              counter: 1,
            },
            name: "counter_index",
            unique: true, // attempt to enforce uniqueness on the counter field.
          },
        ],
      }),
      ErrorCodes.DuplicateKey
    );
  }, primary.port);
 
  awaitIndexBuild_1();
 
  rst.stopSet();
})();

Comment by Louis Williams [ 22/Oct/20 ]

Thanks edwin.zhou.

After some discussion with WiredTiger folks, I also don't think it's possible for an insert into the bulk loader to return WT_ROLLBACK. The problem that I reproduced is only theoretical.

That said, I think there is still a non-theoretical case where a WCE can be thrown and uncaught. That would be when the index build is unable to use the bulk loader and reverts to using a normal cursor. If this happens, inserts into the table could start throwing. It doesn't seem like this has led to any bugs in the wild, but with an abundance of caution, we should ensure this is also exception-safe to prevent future problems.

Comment by Edwin Zhou [ 22/Oct/20 ]

I've failed to reproduce the issue in a jsTest. I've tried a number of things to mimic my behavior when I originally reproduced the invariant:

  1. Insert >19m 56B docs on a 3 node replica set with a duplicate field.
  2. Insert 300k 1KB docs on a 3 node replica set with a duplicate field. 

I then performed index builds in these orders:

  1. Start a parallel index build that attempts to enforce uniqueness.
  2. Start 2 parallel index builds (one attempts to enforce uniqueness) while inserting documents.
  3. Start an index build, drop the indices, then start another build.

None of these produced the invariant we were looking for.

 

I was able to confirm that the invariant occurs on r4.4.1 after building the binary. This was tested on the data set that I originally repro'd the invariant on, replacing just the path of the mongod. The invariant disappears on the latest of v4.4 built yesterday.

Comment by Louis Williams [ 19/Oct/20 ]

This is demanged the stack trace of the WCE location from edwin.zhou's repro:
 

2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x105A4E6E0 mongo::printStackTrace(mongo::StackTraceSink&)+0x160
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x105A4FE68 mongo::printStackTrace()+0x28
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1058F5866 mongo::WriteConflictException::WriteConflictException()+0xB6
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1039D787C mongo::wtRCToStatus_slow(int, char const*)+0x26C
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1039B2458 mongo::WiredTigerRecordStore::_insertRecords(mongo::OperationContext*, mongo::Record*, mongo::Timestamp const*, u
nsigned long)+0x658
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1039B1DF2 mongo::WiredTigerRecordStore::insertRecords(mongo::OperationContext*, std::__1::vector<mongo::Record, std::__1::a
llocator<mongo::Record> >*, std::__1::vector<mongo::Timestamp, std::__1::allocator<mongo::Timestamp> > const&)+0x22
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1045339C3 mongo::DuplicateKeyTracker::recordKeys(mongo::OperationContext*, std::__1::vector<mongo::BSONObj, std::__1::alloc
ator<mongo::BSONObj> > const&)+0x5B3
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x104524A48 mongo::IndexBuildInterceptor::recordDuplicateKeys(mongo::OperationContext*, std::__1::vector<mongo::BSONObj, std:
:__1::allocator<mongo::BSONObj> > const&)+0x38
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x10451CD00 mongo::MultiIndexBlock::dumpInsertsFromBulk(mongo::OperationContext*, std::__1::set<mongo::RecordId, std::__1::le
ss<mongo::RecordId>, std::__1::allocator<mongo::RecordId> >*)+0x3D0
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x10451B88B mongo::MultiIndexBlock::insertAllDocumentsInCollection(mongo::OperationContext*, mongo::Collection*)+0x99B
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x104511159 mongo::IndexBuildsManager::startBuildingIndex(mongo::OperationContext*, mongo::Collection*, mongo::UUID const&)+0
x49
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1044EE749 mongo::IndexBuildsCoordinator::_scanCollectionAndInsertKeysIntoSorter(mongo::OperationContext*, std::__1::shared_
ptr<mongo::ReplIndexBuildState>)+0x1B9
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1044EE37B mongo::IndexBuildsCoordinator::_buildIndex(mongo::OperationContext*, std::__1::shared_ptr<mongo::ReplIndexBuildSt
ate>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x7B
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1044EDB68 mongo::IndexBuildsCoordinator::_runIndexBuildInner(mongo::OperationContext*, std::__1::shared_ptr<mongo::ReplInde
xBuildState>, mongo::IndexBuildsCoordinator::IndexBuildOptions const&)+0x88
2020-10-19T15:16:20.941Z I  CONTROL  [IndexBuildsCoordinatorMongod-0]   Frame: 0x1044EBD59 mongo::IndexBuildsCoordinator::_runIndexBuild(mongo::OperationContext*, mongo::UUID const&, mongo::IndexBuildsCoo
rdinator::IndexBuildOptions const&)+0x479
 

In 4.4.1, the call to recordDuplicateKeys in the trace was not protected by a writeConflictRetry handler, but with the backport of SERVER-49507 (which will land in 4.4.2), I believe this has since been fixed. edwin.zhou to confirm with his repro.

Comment by Edwin Zhou [ 16/Oct/20 ]

Steps to Reproduce

  1. Begin by inserting about 19 million documents where the last inserted document has a field that is not unique

    db = db.getSiblingDB("test")
    db.test_col.drop()
    const n = 19000000
    for (let i = 0; i < n; i++) {
      db.test_col.insert({ counter: i, name: `${i}_item` });
    }
    db.test_col.insert({counter: n-1, name: `${n}_item`}) // makes counter a non-unique field
    


  2. Add --wiredTigerCacheSizeGB 0.25 in .mlaunch_startup.startup_info to provision only 256MB.
  3. Create parallel indices that attempts to enforce a unique field.

    db.test_col.dropIndexes()
    db.runCommand({
      createIndexes: "test_col",
      indexes: [
        {
          key: {
            name: 1,
          },
          name: "name_index"
        },
        {
          key: {
            counter: 1,
          },
          name: "counter_index",
          unique: true, // attempt to enforce uniqueness on the counter field.
        }
      ],
    });
    


  4. After some time, the node should crash with the WriteConflict error and the invariant

    {"t":{"$date":"2020-10-16T17:28:28.553Z"},"s":"I","c":"STORAGE","id":20649,"ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build failed","attr":{"buildUUID":{"uuid":{"_uuid":"6926f854-ac65-4f30-b2aa-41350316aa68"}},"namespace":"test.test_col","uuid":{"uuid":{"_uuid":"5623f9cc-8cd9-4cf9-8311-840dfda10660"}},"error":{"code":112,"codeName":"WriteConflict","errmsg":"WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction."}}}
    {"t":{"$date":"2020-10-16T17:28:28.553Z"},"s":"F","c":"-","id":23081,"ctx":"IndexBuildsCoordinatorMongod-0","msg":"Invariant failure","attr":{"expr":"status.isA<ErrorCategory::Interruption>() || status.isA<ErrorCategory::ShutdownError>()","msg":"Unnexpected error code during index build cleanup: WriteConflict: WriteConflict error: this operation conflicted with another operation. Please retry your operation or multi-document transaction.","file":"src/mongo/db/index_builds_coordinator.cpp","line":2090}}
    

Some things to note:

  1. The invariant seems to only occur when the index build is bound to fail by enforcing uniqueness on a non unique field.
  2. We can resolve the invariant by provisioning a bigger cache to the machine. In my case, I removed --wiredTigerCacheSizeGB entirely to provision 8GB. Read more about how the size is calculated here. After restarting the node, it resumes the index build and will successfully fail.
  3. Reaching the invariant with traceWriteConflictExceptions: true provided this stack trace.
Generated at Thu Feb 08 05:25:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.