[DOCS-15751] [Server] Hang when inserting or deleting document with many index entries Created: 22/Nov/22  Updated: 13/Nov/23  Resolved: 03/Feb/23

Status: Closed
Project: Documentation
Component/s: manual, Server
Affects Version/s: None
Fix Version/s: 6.3.0-rc0, 6.2.0-rc3, 6.0.5, 5.0.16, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Backlog - Core Eng Program Management Team Assignee: Dave Cuthbert (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents SERVER-61909 Hang inserting or deleting document w... Closed
Duplicate
is duplicated by DOCS-15966 [BACKPORT] [v5.0] Hang inserting or d... Closed
Related
related to DOCS-15874 [BACKPORT] [v6.0] Hang inserting or d... Closed
Participants:
Days since reply: 1 year, 9 weeks, 5 days ago
Epic Link: DOCSP-22091

 Description   

ORIGINAL TITLE: [Server] Investigate changes in SERVER-61909: Hang inserting or deleting document with large number of index entries


Original Downstream Change Summary

A new TransactionTooLargeForCache error has been introduced. This indicates that the transaction was rolled back due to cache pressure, and is unlikely to complete even if retried due to the cache being insufficient.

The threshold at which this error is triggered can be modified with the transactionTooLargeForCacheThreshold, and setting it to 1.0 disables this behaviour. Basically, if the transaction accounts for more than 75% (default) total dirty cache use and is rolled back, it is assumed that it is unlikely to complete. Since the dirty cache limit is 20% of the total cache, this means that the largest transactions may only occupy 15% of the total size of the storage engine cache.

If the conditions are met, it may now be the case that a TransactionTooLargeForCache is thrown instead of a TemporarilyUnavailable or WriteConflict.

https://github.com/10gen/mongo/blob/master/src/mongo/db/catalog/README.md#transactiontoolargeforcacheexception

Description of Linked Ticket

Inserting a document that creates a large number of index entries can create a large amount of dirty data in a single transaction, causing it to be canceled and retried indefinitely, resulting in a hang.

For example on a node with a 256 MB cache, create a text index then insert a document with a large string to be indexed, or equivalently a lot of terms to be indexed:

function repro() {
 
    db.c.drop()
    printjson(db.c.createIndex({x: "text"}))
    
    doc = {x: []}
    for (var j = 0; j < 50000; j++)
        doc.x.push("" + Math.random() + Math.random())
 
    for (var i = 0; i < 20; i++) {
        start = new Date()
        db.c.insert(doc)
        print(new Date() - start, "ms")
    }
}

This will hang after a few documents, with high cache pressure, and the following emited repeatedly in the log:

{"t":\{"$date":"2021-12-03T11:43:20.820-05:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn21","msg":"WiredTiger message","attr":\{"message":"oldest pinned transaction ID rolled back for eviction"}}

This will effectively make the server inoperational due to cache pressure. If it occurs on the secondaries they will stall because it will prevent completion of the current batch.

This is a regression as these inserts complete successfully (even if somewhat slowly) in 4.2.

I think this is related to SERVER-61454, but I'm opening this as a distinct ticket because

  • This is a somewhat different use case as the issue can be reliably created with single inserts.
  • I don't think the change described in SERVER-61454 would apply here, as the insert is the only transaction running so delaying retries would have no effect, and the issue is not related to CPU resource starvation as far as I can tell.
  • It's not clear to me where the appropriate fix would lie - query layer, retry behavior, storage engine behavior.


 Comments   
Comment by Education Bot [ 02/Dec/22 ]

Fix Version updated for upstream SERVER-61909:
6.3.0-rc0, 6.2.0-rc3

Comment by Education Bot [ 22/Nov/22 ]

Fix Version updated for upstream SERVER-61909:
6.3.0-rc0

Generated at Thu Feb 08 08:13:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.