Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31097

Two shards in cluster getting WT LIBRARY PANIC creating a simple index and every index retry crashes again

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.4.6
    • Component/s: Index Maintenance
    • Labels:
      None
    • ALL

      When creating a background index in our cluster with 7 shards (600 mi documents) and in one collection sharded by a hased index, the server continuously crashes.

      We created this index:

      2017-09-14T20:42:29.543+0000 I INDEX    [initandlisten] found 1 interrupted index build(s) on shipyard.investigation_cards
      2017-09-14T20:42:29.543+0000 I INDEX    [initandlisten] note: restart the server with --noIndexBuildRetry to skip index rebuilds
      2017-09-14T20:42:29.545+0000 I INDEX    [initandlisten] build index on: shipyard.investigation_cards properties: { v: 2, key: { account_id: 1, universe_id: 1, stilingue_array.call_id: 1, stilingue_array.page_id: 1, normalized_posted_at: 1 }, name: "sac_call_id", ns: "shipyard.investigation_cards", background: true }
      2017-09-14T20:42:29.545+0000 I INDEX    [initandlisten] 	 building index using bulk method; build may temporarily use up to 500 megabytes of RAM
      

      After some time building the MongoDB crashed with this error:

      2017-09-14T20:43:58.517+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1505421838:517507][852475:0x7f9e3c4b2d40], file:collection-22-3497018620930100997.wt, WT_CURSOR.next: read checksum error for 8192B block at offset 72198791168: block header checksum of 0 doesn't match expected checksum of 707510254
      2017-09-14T20:43:58.517+0000 E STORAGE  [initandlisten] WiredTiger error (0) [1505421838:517551][852475:0x7f9e3c4b2d40], file:collection-22-3497018620930100997.wt, WT_CURSOR.next: collection-22-3497018620930100997.wt: encountered an illegal file format or internal value
      2017-09-14T20:43:58.517+0000 E STORAGE  [initandlisten] WiredTiger error (-31804) [1505421838:517558][852475:0x7f9e3c4b2d40], file:collection-22-3497018620930100997.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2017-09-14T20:43:58.517+0000 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
      2017-09-14T20:43:58.517+0000 I -        [initandlisten] 
      

      I wil attach two log files. First one is the first crash (right after the index build start) and the second one is a subsequent crash.
      If you guys needs more data I will need a secure portal to upload my data, because we have big files here. Unfortunately I can't upload any data files from this collection for security reasons.

      When I started the server with the option --noIndexBuildRetry, it stops the crashes. I will make initial sync in those two servers because I'm not confident if this did not corrupted any data or index in my database.

        1. mongodb.log.2017-09-14T19-26-15
          1002 kB
        2. mongodb.log.2017-09-14T19-26-35
          29 kB
        3. new_database_corruption.7z
          3.88 MB

            Assignee:
            Unassigned Unassigned
            Reporter:
            lucasoares Lucas
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: