Loading...

XML

Word

Printable

JSON

Type: Investigation
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Story Points:
3
Sprint:
TAR 2023-05-15
Days Spent:
0.5

Original Downstream Change Summary

A new TransactionTooLargeForCache error has been introduced. This indicates that the transaction was rolled back due to cache pressure, and is unlikely to complete even if retried due to the cache being insufficient.

The threshold at which this error is triggered can be modified with the transactionTooLargeForCacheThreshold, and setting it to 1.0 disables this behaviour. Basically, if the transaction accounts for more than 75% (default) total dirty cache use and is rolled back, it is assumed that it is unlikely to complete. Since the dirty cache limit is 20% of the total cache, this means that the largest transactions may only occupy 15% of the total size of the storage engine cache.

If the conditions are met, it may now be the case that a TransactionTooLargeForCache is thrown instead of a TemporarilyUnavailable or WriteConflict.

https://github.com/10gen/mongo/blob/master/src/mongo/db/catalog/README.md#transactiontoolargeforcacheexception

Description of Linked Ticket

Inserting a document that creates a large number of index entries can create a large amount of dirty data in a single transaction, causing it to be canceled and retried indefinitely, resulting in a hang.

For example on a node with a 256 MB cache, create a text index then insert a document with a large string to be indexed, or equivalently a lot of terms to be indexed:

function repro() {

    db.c.drop()
    printjson(db.c.createIndex({x: "text"}))
    
    doc = {x: []}
    for (var j = 0; j < 50000; j++)
        doc.x.push("" + Math.random() + Math.random())

    for (var i = 0; i < 20; i++) {
        start = new Date()
        db.c.insert(doc)
        print(new Date() - start, "ms")
    }
}

This will hang after a few documents, with high cache pressure, and the following emited repeatedly in the log:

{"t":\{"$date":"2021-12-03T11:43:20.820-05:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn21","msg":"WiredTiger message","attr":\{"message":"oldest pinned transaction ID rolled back for eviction"}}

This will effectively make the server inoperational due to cache pressure. If it occurs on the secondaries they will stall because it will prevent completion of the current batch.

This is a regression as these inserts complete successfully (even if somewhat slowly) in 4.2.

I think this is related to ~~SERVER-61454~~, but I'm opening this as a distinct ticket because

This is a somewhat different use case as the issue can be reliably created with single inserts.
I don't think the change described in ~~SERVER-61454~~ would apply here, as the insert is the only transaction running so delaying retries would have no effect, and the issue is not related to CPU resource starvation as far as I can tell.
It's not clear to me where the appropriate fix would lie - query layer, retry behavior, storage engine behavior.

depends on

SERVER-61909 Hang inserting or deleting document with large number of index entries

Closed

Assignee:: Michael McClimon
Reporter:: Backlog - Core Eng Program Management Team
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Nov 22 2022 10:08:27 AM UTC
Updated:: May 22 2023 01:28:05 PM UTC
Resolved:: May 22 2023 01:28:05 PM UTC

Details

Description

Description of Linked Ticket

Attachments

Issue Links

Activity

People

Dates