Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66250

WT_PANIC when non-fsynced writes are lost

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.9
    • Component/s: None
    • ALL
    • Hide

      I'm afraid a straightforward repro case will have to wait for the release of LazyFS–hopefully in a few weeks–but if you have the opportunity to maybe stub out the loss of un-fsynced writes yourself, that might be one possible way to reproduce this!

      Show
      I'm afraid a straightforward repro case will have to wait for the release of LazyFS–hopefully in a few weeks–but if you have the opportunity to maybe stub out the loss of un-fsynced writes yourself, that might be one possible way to reproduce this!

      When mongod 4.4.9's data directory loses writes which were not explicitly fsynced, attempts to restart mongod can throw WT_PANIC during the WAL recovery process.

      LazyFS is a new FUSE filesystem (presently unreleased, but we hope to publish it shortly) which simulates the effects of power loss by maintaining an in-memory page cache for all writes, and only flushing pages to its backing store when explicitly asked to fsync. We've integrated LazyFS into Jepsen by mounting /var/lib/mongodb on LazyFS, and after killing mongod, we ask LazyFS to drop any unfsynced writes. Here's an example of this test in action. Nodes n1, n3, and n6 all panicked on restart. Here's n3 beginning recovery:

      {"t":{"$date":"2022-05-05T10:59:26.822-04:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data f
      rom the last clean checkpoint."}
      ...
       {"t":{"$date":"2022-05-05T10:59:59.581-04:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger messag
      e","attr":{"message":"[1651762799:581115][1200823:0x7f257df12cc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1 through 
      2"}}
      {"t":{"$date":"2022-05-05T11:00:00.043-04:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger messag
      e","attr":{"message":"[1651762800:43098][1200823:0x7f257df12cc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 2 through 2
      "}}
      {"t":{"$date":"2022-05-05T11:00:01.001-04:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger messag
      e","attr":{"message":"[1651762801:623][1200823:0x7f257df12cc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main reco
      very loop: starting at 1/92800 to 2/256"}}

      Roughly 35 seconds later, the process crashed, complaining of pread failing to read a byte at a specific offset:

      {"t":{"$date":"2022-05-05T11:00:01.475-04:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1651762801:475745][1200823:0x7f257df12cc0], file:WiredTiger.wt, WT_CURSOR.next: __posix_file_read, 428: /var/lib/mongodb/WiredTiger.turtle: handle-read: pread: failed to read 1 bytes at offset 1456: WT_ERROR: non-specific WiredTiger error"}}
      {"t":{"$date":"2022-05-05T11:00:01.475-04:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31809,"message":"[1651762801:475901][1200823:0x7f257df12cc0], file:WiredTiger.wt, WT_CURSOR.next: __wt_turtle_read, 387: WiredTiger.turtle: fatal turtle file read error: WT_TRY_SALVAGE: database corruption detected"}}
      {"t":{"$date":"2022-05-05T11:00:01.475-04:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31804,"message":"[1651762801:475935][1200823:0x7f257df12cc0], file:WiredTiger.wt, WT_CURSOR.next: __wt_turtle_read, 387: the process must exit and restart: WT_PANIC: WiredTiger library panic"}} 

      Since this is a new filesystem and we haven't sanded off all the bugs yet, I'm not certain whether we haven't introduced some sort of unexpected corruption in the filesystem beyond what would be caused by the loss of unsynced writes–but it does seem odd that the whole process had to crash given at least some of the WAL was apparently readable. Does this sound like expected behavior?

        1. mongod.tar.bz2
          3.27 MB
          _aphyr@jepsen.io

            Assignee:
            keith.smith@mongodb.com Keith Smith
            Reporter:
            aphyr@jepsen.io Kyle Kingsbury
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved: