Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17713

WiredTiger using zlib compression can create invalid compressed stream

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.0, 3.0.1
    • Fix Version/s: 3.0.2, 3.1.1
    • Component/s: WiredTiger
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:

      Description

      Issue Status as of Apr 02, 2015

      ISSUE SUMMARY
      In some rare circumstances, WiredTiger configured with zlib compression can create a corrupted on-disk file. If you see the following messages in your log file, you may have encountered this error:

      2015-03-24T09:27:19.605-0400 E STORAGE  [initandlisten] WiredTiger (0) [1427203639:605943][21310:0x7fe063080b80], file:collection-2--6089165247661965497.wt, cursor.prev: zlib error: inflate: data error: -3
      2015-03-24T09:27:19.606-0400 E STORAGE  [initandlisten] WiredTiger (0) [1427203639:606093][21310:0x7fe063080b80], file:collection-2--6089165247661965497.wt, cursor.prev: file:collection-2--6089165247661965497.wt: encountered an illegal file format or internal value
      2015-03-24T09:27:19.606-0400 E STORAGE  [initandlisten] WiredTiger (-31804) [1427203639:606114][21310:0x7fe063080b80], file:collection-2--6089165247661965497.wt, cursor.prev: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2015-03-24T09:27:19.606-0400 I -        [initandlisten] Fatal Assertion 28558
      

      USER IMPACT
      mongod may terminate when it subsequently accesses the corrupted block, or may return corrupted data to a query.

      WORKAROUNDS
      Use snappy compression or upgrade to 3.0.2.

      AFFECTED VERSIONS
      3.0.0 and 3.0.1

      FIX VERSION
      The fix is included in the 3.0.2 production release.

      Original description

      Under certain conditions WiredTiger using zlib compression creates an invalid and unrecoverable compressed stream, resulting in the following fatal error on subsequent access:

      2015-03-24T09:27:19.103-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=5G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-03-24T09:27:19.605-0400 E STORAGE  [initandlisten] WiredTiger (0) [1427203639:605943][21310:0x7fe063080b80], file:collection-2--6089165247661965497.wt, cursor.prev: zlib error: inflate: data error: -3
      2015-03-24T09:27:19.606-0400 E STORAGE  [initandlisten] WiredTiger (0) [1427203639:606093][21310:0x7fe063080b80], file:collection-2--6089165247661965497.wt, cursor.prev: file:collection-2--6089165247661965497.wt: encountered an illegal file format or internal value
      2015-03-24T09:27:19.606-0400 E STORAGE  [initandlisten] WiredTiger (-31804) [1427203639:606114][21310:0x7fe063080b80], file:collection-2--6089165247661965497.wt, cursor.prev: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2015-03-24T09:27:19.606-0400 I -        [initandlisten] Fatal Assertion 28558
      2015-03-24T09:27:19.616-0400 I CONTROL  [initandlisten] 
       0xf4fe49 0xefa091 0xeddc81 0xd790ea 0x1380900 0x1380bc5 0x1381064 0x12f0fa7 0x12f5485 0x12f2823 0x1306424 0x12e0e7f 0x1322c19 0xd6794c 0xd67a42 0xd6819a 0xd61e42 0xce22b6 0xce53ec 0xd60cb6 0xa6f9cd 0x7e20c0 0x7e7704 0x7fe061c7efe0 0x7e02c9
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B4FE49"},{"b":"400000","o":"AFA091"},{"b":"400000","o":"ADDC81"},{"b":"400000","o":"9790EA"},{"b":"400000","o":"F80900"},{"b":"400000","o":"F80BC5"},{"b":"400000","o":"F81064"},{"b":"400000","o":"EF0FA7"},{"b":"400000","o":"EF5485"},{"b":"400000","o":"EF2823"},{"b":"400000","o":"F06424"},{"b":"400000","o":"EE0E7F"},{"b":"400000","o":"F22C19"},{"b":"400000","o":"96794C"},{"b":"400000","o":"967A42"},{"b":"400000","o":"96819A"},{"b":"400000","o":"961E42"},{"b":"400000","o":"8E22B6"},{"b":"400000","o":"8E53EC"},{"b":"400000","o":"960CB6"},{"b":"400000","o":"66F9CD"},{"b":"400000","o":"3E20C0"},{"b":"400000","o":"3E7704"},{"b":"7FE061C5F000","o":"1FFE0"},{"b":"400000","o":"3E02C9"}],"processInfo":{ "mongodbVersion" : "3.0.1", "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952", "uname" : { "sysname" : "Linux", "release" : "3.17.4-301.fc21.x86_64", "version" : "#1 SMP Thu Nov 27 19:09:10 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF371FE000", "elfType" : 3 }, { "b" : "7FE062C56000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "7FE062A4E000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "7FE06284A000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "7FE06253B000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "7FE062233000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "7FE06201C000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7FE061C5F000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "7FE062E72000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf4fe49]
       mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefa091]
       mongod(_ZN5mongo13fassertFailedEi+0x61) [0xeddc81]
       mongod(+0x9790EA) [0xd790ea]
       mongod(+0xF80900) [0x1380900]
       mongod(__wt_err+0x95) [0x1380bc5]
       mongod(__wt_panic+0x24) [0x1381064]
       mongod(__wt_bt_read+0x437) [0x12f0fa7]
       mongod(__wt_cache_read+0x1C5) [0x12f5485]
       mongod(__wt_page_in_func+0x403) [0x12f2823]
       mongod(__wt_tree_walk+0x594) [0x1306424]
       mongod(__wt_btcur_prev+0xB4F) [0x12e0e7f]
       mongod(+0xF22C19) [0x1322c19]
       mongod(_ZN5mongo21WiredTigerRecordStore8Iterator8_getNextEv+0x2C) [0xd6794c]
       mongod(_ZN5mongo21WiredTigerRecordStore8Iterator7getNextEv+0x12) [0xd67a42]
       mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextERKNS_10StringDataES5_bllPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x46A) [0xd6819a]
       mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextERKNS_10StringDataES5_RKNS_17CollectionOptionsE+0x132) [0xd61e42]
       mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x276) [0xce22b6]
       mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x69C) [0xce53ec]
       mongod(+0x960CB6) [0xd60cb6]
       mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa6f9cd]
       mongod(_ZN5mongo13initAndListenEi+0x2F0) [0x7e20c0]
       mongod(main+0x134) [0x7e7704]
       libc.so.6(__libc_start_main+0xF0) [0x7fe061c7efe0]
       mongod(+0x3E02C9) [0x7e02c9]
      -----  END BACKTRACE  -----
      2015-03-24T09:27:19.617-0400 I -        [initandlisten] 
       
      ***aborting after fassert() failure
      

        Issue Links

          Activity

          Hide
          michael.cahill Michael Cahill added a comment -

          Resolved with latest drop from WT.

          Show
          michael.cahill Michael Cahill added a comment - Resolved with latest drop from WT.
          Hide
          keith.bostic Keith Bostic added a comment -

          Agreed, this could cause undetected corruption at pretty much any time the object is being written.

          It's hard to say why these users hit this issue, but my belief is it's data dependent, that is, a particular set of data will trigger the failure. My guess is it's large data items (large, that is, with respect to the configured block size).

          [~bruce.lucas@10gen.com]' analysis indicates there's only a few corrupted bytes and they're in the zlib header (not in the data itself), so we could probably figure out how to overwrite the particular corrupted bytes with correct ones, but nobody has investigated that as far as I know.

          Show
          keith.bostic Keith Bostic added a comment - Agreed, this could cause undetected corruption at pretty much any time the object is being written. It's hard to say why these users hit this issue, but my belief is it's data dependent, that is, a particular set of data will trigger the failure. My guess is it's large data items (large, that is, with respect to the configured block size). [~bruce.lucas@10gen.com] ' analysis indicates there's only a few corrupted bytes and they're in the zlib header (not in the data itself), so we could probably figure out how to overwrite the particular corrupted bytes with correct ones, but nobody has investigated that as far as I know.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'}

          Message: Use deflateCopy to copy streams for rollback in case the compressed size is too large.

          refs SERVER-17713
          Branch: validate-configuration-string
          https://github.com/wiredtiger/wiredtiger/commit/4c0881afeb6713ef7ae9ea2b8f61811b0fecd192

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@wiredtiger.com'} Message: Use deflateCopy to copy streams for rollback in case the compressed size is too large. refs SERVER-17713 Branch: validate-configuration-string https://github.com/wiredtiger/wiredtiger/commit/4c0881afeb6713ef7ae9ea2b8f61811b0fecd192

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: