Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61909

Hang inserting or deleting document with large number of index entries

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.2.0-rc3, 6.3.0-rc0, 6.0.5, 5.0.16
    • Affects Version/s: None
    • Component/s: None
    • None
    • Minor Change
    • ALL
    • v6.2, v6.0, v5.0
    • Execution Team 2022-10-17, Execution Team 2022-10-31, Execution Team 2022-11-14, Execution Team 2022-11-28
    • 164

      Inserting a document that creates a large number of index entries can create a large amount of dirty data in a single transaction, causing it to be canceled and retried indefinitely, resulting in a hang.

      For example on a node with a 256 MB cache, create a text index then insert a document with a large string to be indexed, or equivalently a lot of terms to be indexed:

      function repro() {
      
          db.c.drop()
          printjson(db.c.createIndex({x: "text"}))
          
          doc = {x: []}
          for (var j = 0; j < 50000; j++)
              doc.x.push("" + Math.random() + Math.random())
      
          for (var i = 0; i < 20; i++) {
              start = new Date()
              db.c.insert(doc)
              print(new Date() - start, "ms")
          }
      }
      
      

      This will hang after a few documents, with high cache pressure, and the following emited repeatedly in the log:

      {"t":\{"$date":"2021-12-03T11:43:20.820-05:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"conn21","msg":"WiredTiger message","attr":\{"message":"oldest pinned transaction ID rolled back for eviction"}}
      

      This will effectively make the server inoperational due to cache pressure. If it occurs on the secondaries they will stall because it will prevent completion of the current batch.

      This is a regression as these inserts complete successfully (even if somewhat slowly) in 4.2.

      I think this is related to SERVER-61454, but I'm opening this as a distinct ticket because

      • This is a somewhat different use case as the issue can be reliably created with single inserts.
      • I don't think the change described in SERVER-61454 would apply here, as the insert is the only transaction running so delaying retries would have no effect, and the issue is not related to CPU resource starvation as far as I can tell.
      • It's not clear to me where the appropriate fix would lie - query layer, retry behavior, storage engine behavior.

            Assignee:
            yujin.kang@mongodb.com Yujin Kang Park
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            36 Start watching this issue

              Created:
              Updated:
              Resolved: