Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67867

Recover and proceed with TTL pass if document removal fails

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.0.9, 6.0.0-rc13
    • Component/s: None
    • Labels:
      None
    • Storage Execution
    • Execution Team 2022-10-17

      Failure to delete a document interrupts a TTL pass. Subsequent TTL passes hit the same failure. As such, if a single document removal fails, TTL on a collection is halted. This can occur in the wake of bugs like WT-7995.

      {"t":{"$date":"2022-01-01T00:00:00.000Z"},"s":"E","c":"QUERY","id":4615603,"ctx":"TTLMonitor","msg":"Erroneous index key found with reference to non-existent record id. Consider dropping and then re-creating the index and then running the validate command on the collection.","attr":{"namespace":"db.coll","recordId":"22922469419","indexKeyData":[{"key":{"started":{"$date":"2021-01-01T00:00:00.000Z"}},"pattern":{"started":1}}]}}
      {"t":{"$date":"2022-01-01T00:00:01.000Z"},"s":"E","c":"INDEX","id":5400703,"ctx":"TTLMonitor","msg":"Error running TTL job on collection","attr":{"namespace":"db.coll","error":{"code":301,"codeName":"DataCorruptionDetected","errmsg":"Erroneous index key found with reference to non-existent record id. Consider dropping and then re-creating the index and then running the validate command on the collection."}}}
      

      It's good that this information is logged, but the TTL pass should follow up a failure like this by doing additional work:

      • identify a subsequent range of documents to delete
      • delete that range of documents

      I'd suggest not preserving any state about what's been skipped and recommend against trying to fix the inconsistency by removing index the erroneous index key. That is: the TTLMonitor should continue to try to behave "normally" every time it runs. This ensures that an error like this continues to be logged instead of being accounted for and forgotten.

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            eric.sedor@mongodb.com Eric Sedor
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: