Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4459

Some recovery errors lead to memory leak

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None

      One of the mongoDB corruption tests runs on ASAN and is indicating a memory leak. It is one of the new salvage/repair tests. We see messages in the log like:

      [js_test:wt_repair_corrupt_metadata] 2018-11-06T14:35:00.699+0000 d20021| 2018-11-06T14:35:00.699+0000 I STORAGE  [initandlisten]
      wiredtiger_open config: create,cache_size=1024M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
      [1541514901:406868][58538:0x7f6559b76a80], txn-recover: __wt_txn_recover, 740: Recovery failed: WT_NOTFOUND: item not found
      [1541514901:407502][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 384: cache server: exiting with 1 pages in memory and 0 pages evicted
      [1541514901:407575][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 389: cache server: exiting with 51 image bytes in memory
      [1541514901:407621][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 393: cache server: exiting with 315 bytes in memory
      [1541514901:423924][58538:0x7f6559b76a80], txn-recover: __wt_txn_recover, 740: Recovery failed: WT_NOTFOUND: item not found
      [1541514901:424643][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 384: cache server: exiting with 1 pages in memory and 0 pages evicted
      [1541514901:424699][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 389: cache server: exiting with 51 image bytes in memory
      [1541514901:424735][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 393: cache server: exiting with 315 bytes in memory
      [1541514901:441336][58538:0x7f6559b76a80], txn-recover: __wt_txn_recover, 740: Recovery failed: WT_NOTFOUND: item not found
      [1541514901:441995][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 384: cache server: exiting with 1 pages in memory and 0 pages evicted
      [1541514901:442054][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 389: cache server: exiting with 51 image bytes in memory
      [1541514901:442090][58538:0x7f6559b76a80], connection: __wt_cache_destroy, 393: cache server: exiting with 315 bytes in memory
      Failed to start up WiredTiger under any compatibility version.
      Reason: -31804: WT_PANIC: WiredTiger library panic
      Attempting to salvage WiredTiger metadata
      

      Although the open with salvage succeeds and the system is correctly repaired and able to run, the initial error and the cache destroy messages indicate a memory leak and the ASAN then complains about that leak.

      The nature of the error is that the WiredTiger.turtle file has an invalid/bad checkpoint_lsn=(1,2). When we call wt_log_scan to recover the metadata file on the first pass of recovery it detects the bad LSN and returns WT_NOTFOUND.

      There are a number of things to do here:
      1. Add a bad-lsn case to the test/csuite/wt4156_metadata_salvage test.
      2. Find and fix the leak. This is not a panic error. We haven't actually recovered anything yet, so it isn't clear where this memory is being used. Actually it may be the metadata cursor in use - it appears the error paths don't close that.
      3. Consider if wt_txn_recover:636 where we check for an error ret of ENOENT to set WT_CONN_DATA_CORRUPTION should also check for WT_NOTFOUND to cover this case.

        1. 4459.diff
          2 kB
          Susan LoVerso

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            sue.loverso@mongodb.com Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: