[SERVER-58311] MongoDB calls WT RTS API while holding an open cursor on _mdb_catalog Created: 07/Jul/21  Updated: 29/Oct/23  Resolved: 01/Oct/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Haribabu Kommi Assignee: Josef Ahmad
Resolution: Fixed Votes: 0
Labels: techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to WT-8429 Improve error handling when rollback_... Blocked
related to SERVER-60335 Wait for all user operations to be ki... Backlog
related to SERVER-60334 Avoid caching the cursor and session ... Closed
related to WT-7507 Update salvage for a history store an... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2021-10-04
Participants:

 Description   

WT wants to enforce that RTS take exclusive access on a table to remove any unstable updates to avoid producing inconsistent results with a cursor operation that operates in parallel to it. With that enforcement, we get the following failure from some of the MDB tests.

[j0:s0:n0] | 2021-07-06T02:17:35.057+00:00 E  STORAGE  22435   [BackgroundSync] "WiredTiger error","attr":{"error":16,"message":"[1625537855:57609][7280:0x7f6701201700], file:_mdb_catalog.wt, txn rollback_to_stable: __rollback_to_stable_btree_apply, 1518: file:_mdb_catalog.wt: unable to open handle, error indicates handle is unavailable due to concurrent use: Device or resource busy"}
[j0:s0:n0] | 2021-07-06T02:17:35.057+00:00 F  -        23093   [BackgroundSync] "Fatal assertion","attr":{"msgid":31049,"error":"UnrecoverableRollbackError: Error rolling back to stable. Err: Device or resource busy","file":"src/mongo/db/repl/storage_interface_impl.cpp","line":1346}
[j0:s0:n0] | 2021-07-06T02:17:35.057+00:00 F  -        23094   [BackgroundSync] "\n\n***aborting after fassert() failure\n\n"
[j0:s0:n0] | 2021-07-06T02:17:35.057+00:00 F  CONTROL  4757800 [BackgroundSync] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}

This ticket is to find out to close all open cursors before calling WT RTS API?



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 01/Oct/21 ]

Author:

{'name': 'Josef Ahmad', 'email': 'josef.ahmad@mongodb.com', 'username': 'josefahmad'}

Message: SERVER-58311 Close BackgroundSync cursors before Rollback To Stable

WiredTiger Rollback To Stable (RTS) requires exclusive access to all
WT data handles. BackgroundSync holds an open cursor on
_mdb_catalog.wt just before calling RT. This patch releases all
BackgroundSync cursors in order to free up its data handles.
Branch: master
https://github.com/mongodb/mongo/commit/0950600446ee8030132bacc9c00eb4ebca411500

Comment by Josef Ahmad [ 30/Sep/21 ]

There turned out to be three areas of contention.

  • As daniel.gottlieb discovered earlier, the BackgroundSync thread calls RTS while holding a dhandle to the catalog metadata (_mdb_catalog). This ticket will address this item.
  • The mongod holds the sizeStorer.wt cursor for pretty much the whole server uptime. I filed SERVER-60334 to release the cursor before calling into RTS and reacquire it afterwards.
  • Although we kill all operations before calling into RTS, there's a race between these operations actually shutting down and control being handed to RTS. I filed SERVER-60335 to address it.

Note that these three bullet points are not necessarily an exhaustive list. Once we resolve the three bullet points we'll get a better signal to spot any possible additional contention points down the line.

Comment by Haribabu Kommi [ 08/Jul/21 ]

daniel.gottlieb Thanks for the quick patch. That solved the _mdb_catalog file problem, but still MDB holds a couple of cursors when RTS is called. Following is the new error message that I got with the new patch build.

[j0:s0:n1] | 2021-07-08T11:23:12.554+00:00 I  ROLLBACK 23991   [BackgroundSync] "Rolling back to the stable timestamp","attr":{"stableTimestamp":{"$timestamp":{"t":1625743391,"i":24}},"initialDataTimestamp":{"$timestamp":{"t":1625743324,"i":6}}}
[j0:s0:n1] | 2021-07-08T11:23:12.555+00:00 E  STORAGE  22435   [BackgroundSync] "WiredTiger error","attr":{"error":16,"message":"[1625743392:555354][28032:0x7f644d99d700], file:collection-18-7668954742886040175.wt, txn rollback_to_stable: __rollback_to_stable_btree_apply, 1522: file:collection-18-7668954742886040175.wt: unable to open handle, error indicates handle is unavailable due to concurrent use: Device or resource busy"}
[j0:s0:n1] | 2021-07-08T11:23:12.555+00:00 F  -        23093   [BackgroundSync] "Fatal assertion","attr":{"msgid":31049,"error":"UnrecoverableRollbackError: Error rolling back to stable. Err: Device or resource busy","file":"src/mongo/db/repl/storage_interface_impl.cpp","line":1346}
[j0:s0:n1] | 2021-07-08T11:23:12.555+00:00 F  -        23094   [BackgroundSync] "\n\n***aborting after fassert() failure\n\n"
[j0:s0:n1] | 2021-07-08T11:23:12.555+00:00 F  CONTROL  4757800 [BackgroundSync] "Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}

Comment by Daniel Gottlieb (Inactive) [ 07/Jul/21 ]

haribabu.kommi, I haven't tested this, but I'm curious if it fixes your problem:

diff --git a/src/mongo/db/storage/storage_engine_impl.cpp b/src/mongo/db/storage/storage_engine_impl.cpp
index 4fcbebe6ba..acb2af447c 100644
--- a/src/mongo/db/storage/storage_engine_impl.cpp
+++ b/src/mongo/db/storage/storage_engine_impl.cpp
@@ -993,6 +993,10 @@ StatusWith<Timestamp> StorageEngineImpl::recoverToStableTimestamp(OperationConte
     invariant(opCtx->lockState()->isW());
 
     auto state = catalog::closeCatalog(opCtx);
+    auto recovUnit = opCtx->releaseRecoveryUnit();
+    recovUnit.reset();
+    opCtx->setRecoveryUnit(std::unique_ptr<RecoveryUnit>(newRecoveryUnit()),
+                           WriteUnitOfWork::RecoveryUnitState::kNotInUnitOfWork);
 
     StatusWith<Timestamp> swTimestamp = _engine->recoverToStableTimestamp(opCtx);
     if (!swTimestamp.isOK()) {

Comment by Daniel Gottlieb (Inactive) [ 07/Jul/21 ]

The WT_SESSION calling RTS (thankfully) seems to be the culprit:

(gdb) print *(*(WT_SESSION_IMPL*)(*((WiredTigerSession*)0x55bcbbf91450))._session).cursors.tqh_first
$13 = {
  session = 0x55bcbc02aaf8,
  uri = 0x55bcbbfabbe0 "table:_mdb_catalog",
  key_format = 0x55bcbbc4ddd8 "q",
  value_format = 0x55bcbbc4dde0 "u",
  get_key = 0x7f23cd5afe30 <__wt_cursor_get_key>,
  get_value = 0x7f23cd5b1350 <__wt_cursor_get_value>,
  set_key = 0x7f23cd5b09f0 <__wt_cursor_set_key>,
  set_value = 0x7f23cd5b1ee0 <__wt_cursor_set_value>,
  compare = 0x7f23cd585760 <__curfile_compare>,
  equals = 0x7f23cd585e70 <__curfile_equals>,
  next = 0x7f23cd580610 <__curfile_next>,
  prev = 0x7f23cd57ffa0 <__curfile_prev>,
  reset = 0x7f23cd57e520 <__curfile_reset>,
  search = 0x7f23cd584410 <__curfile_search>,
  search_near = 0x7f23cd584d90 <__curfile_search_near>,
  insert = 0x7f23cd583620 <__curfile_insert>,
  modify = 0x7f23cd581380 <__curfile_modify>,
  update = 0x7f23cd582860 <__curfile_update>,
  remove = 0x7f23cd57eb60 <__curfile_remove>,
  reserve = 0x7f23cd581dc0 <__curfile_reserve>,
  close = 0x7f23cd580c80 <__curfile_close>,
  reconfigure = 0x7f23cd5b31e0 <__wt_cursor_reconfigure>,
  cache = 0x7f23cd57e4c0 <__curfile_cache>,
  reopen = 0x7f23cd57e2b0 <__curfile_reopen>,
  uri_hash = 9936510507291508000,
  q = {
    tqe_next = 0x55bcc1471000, 
    tqe_prev = 0x55bcbc02ac78
  },
  recno = 0,
  raw_recno_buf = "\000\000\000\000\000\000\000\000",
  json_private = 0x0,
  lang_private = 0x0,
  key = {
    data = 0x55bcc101579d, 
    size = 2, 
    mem = 0x0, 
    memsize = 0, 
    flags = 0
  },
  value = {
    data = 0x55bcc10157ad, 
    size = 619, 
    mem = 0x0, 
    memsize = 0, 
    flags = 0
  },
  saved_err = 0,
  internal_uri = 0x55bcbbfc2040 "file:_mdb_catalog.wt",
  flags = 1572868 (['OVERWRITE', 'OPEN', 'CACHEABLE'])
}

I suspect there's no actual bug as this session/cursor can't be doing anything other than calling RTS. haribabu.kommi, I'll see if I can provide a quick patch that you can run to at least verify there's nothing more nefarious going on.

Generated at Thu Feb 08 05:44:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.