Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: WiredTiger
    • Labels:
      None
    • Environment:
    • Operating System:
      ALL

      Description

      2015-05-26T18:36:12.637+0200 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7G,session_max=20000,eviction=(threads_max=4),statistics=(fast),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-05-26T18:36:12.696+0200 E STORAGE  [initandlisten] WiredTiger (0) [1432658172:696344][8451:0x72d008d9db80], file:collection-231--1431395373171935217.wt, cursor.insert: zlib error: inflate: data error: -3
      2015-05-26T18:36:12.696+0200 E STORAGE  [initandlisten] WiredTiger (0) [1432658172:696418][8451:0x72d008d9db80], file:collection-231--1431395373171935217.wt, cursor.insert: file:collection-231--1431395373171935217.wt: encountered an illegal file format or internal value
      2015-05-26T18:36:12.696+0200 E STORAGE  [initandlisten] WiredTiger (-31804) [1432658172:696447][8451:0x72d008d9db80], file:collection-231--1431395373171935217.wt, cursor.insert: the process must exit and restart: WT_PANIC: WiredTiger library panic
      2015-05-26T18:36:12.696+0200 I -        [initandlisten] Fatal Assertion 28558
      2015-05-26T18:36:12.702+0200 I CONTROL  [initandlisten] 
       0xfd03f9 0xf672ff 0xf54232 0xdc8a66 0x141d87b 0x141d9b1 0x141df0e 0x139ebd9 0x13a249f 0x13a0d57 0x13b77d3 0x13992e6 0x13cc36a 0x1427806 0x13e40d5 0x142707b 0x13c44f4 0x13bd912 0xdb2df1 0xdb22ee 0xaa679c 0x7f89f5 0x7f7f80 0x7fdcc9 0x72d00735fec5 0x7f7e97
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"BD03F9"},{"b":"400000","o":"B672FF"},{"b":"400000","o":"B54232"},{"b":"400000","o":"9C8A66"},{"b":"400000","o":"101D87B"},{"b":"400000","o":"101D9B1"},{"b":"400000","o":"101DF0E"},{"b":"400000","o":"F9EBD9"},{"b":"400000","o":"FA249F"},{"b":"400000","o":"FA0D57"},{"b":"400000","o":"FB77D3"},{"b":"400000","o":"F992E6"},{"b":"400000","o":"FCC36A"},{"b":"400000","o":"1027806"},{"b":"400000","o":"FE40D5"},{"b":"400000","o":"102707B"},{"b":"400000","o":"FC44F4"},{"b":"400000","o":"FBD912"},{"b":"400000","o":"9B2DF1"},{"b":"400000","o":"9B22EE"},{"b":"400000","o":"6A679C"},{"b":"400000","o":"3F89F5"},{"b":"400000","o":"3F7F80"},{"b":"400000","o":"3FDCC9"},{"b":"72D00733E000","o":"21EC5"},{"b":"400000","o":"3F7E97"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "3.14.32-xxxx-grs-ipv6-64", "version" : "#1 SMP Sat Feb 7 11:35:27 CET 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "72D0096AD000", "elfType" : 3, "buildId" : "FAF400EE48C6DC7D3D021FC95AA21E92ED9541BC" }, { "b" : "72D008886000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "4E96203F4FE17D3446F48226AAEA8DA6DEA8FFD0" }, { "b" : "72D008668000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "921196598AF41AFF8DE42EEFB8561243610F34C3" }, { "b" : "72D008409000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "B408BD42C304C9370D97ED641544082414C4D59A" }, { "b" : "72D008026000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B0BB841B6CFD35E8D3D2AC285C220A4683A134EF" }, { "b" : "72D007E1E000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "54EF3A97A3E71418DD088B40AF51A00457834A17" }, { "b" : "72D007C1A000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "794CD87201C9778112E22BF5E2C0FBFB3390D29F" }, { "b" : "72D007919000", "path" : "/usr/lib/x86_64-linux-gnu/libc++.so.1", "elfType" : 3 }, { "b" : "72D007702000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "7C6E98219378EBD1AA0D4CD671E8FF1589C04C4A" }, { "b" : "72D00733E000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "95287BE8ACCCC7B5723F4306E6A5ECA6DFE7BFFD" }, { "b" : "72D008B8C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9240DBBD1DB14E756141EEE1FDDB67D3B77864E7" } ] }}
       mongod(_ZN5mongo15printStackTraceERNSt3__113basic_ostreamIcNS0_11char_traitsIcEEEE+0x29) [0xfd03f9]
       mongod(_ZN5mongo10logContextEPKc+0x11F) [0xf672ff]
       mongod(_ZN5mongo13fassertFailedEi+0xF2) [0xf54232]
       mongod(+0x9C8A66) [0xdc8a66]
       mongod(__wt_eventv+0x41B) [0x141d87b]
       mongod(__wt_err+0xA1) [0x141d9b1]
       mongod(__wt_illegal_value+0x5E) [0x141df0e]
       mongod(__wt_bt_read+0x1D9) [0x139ebd9]
       mongod(__wt_cache_read+0xAF) [0x13a249f]
       mongod(__wt_page_in_func+0x637) [0x13a0d57]
       mongod(__wt_row_search+0x5C3) [0x13b77d3]
       mongod(__wt_btcur_insert+0x386) [0x13992e6]
       mongod(+0xFCC36A) [0x13cc36a]
       mongod(+0x1027806) [0x1427806]
       mongod(__wt_log_scan+0x8A5) [0x13e40d5]
       mongod(__wt_txn_recover+0x24B) [0x142707b]
       mongod(__wt_connection_workers+0x54) [0x13c44f4]
       mongod(wiredtiger_open+0x14A2) [0x13bd912]
       mongod(_ZN5mongo18WiredTigerKVEngineC1ERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEES9_bb+0x701) [0xdb2df1]
       mongod(+0x9B22EE) [0xdb22ee]
       mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKNSt3__112basic_stringIcNS1_11char_traitsIcEENS1_9allocatorIcEEEE+0x2EC) [0xaa679c]
       mongod(+0x3F89F5) [0x7f89f5]
       mongod(_ZN5mongo13initAndListenEi+0x10) [0x7f7f80]
       mongod(main+0x6B9) [0x7fdcc9]
       libc.so.6(__libc_start_main+0xF5) [0x72d00735fec5]
       mongod(+0x3F7E97) [0x7f7e97]
      -----  END BACKTRACE  -----
      2015-05-26T18:36:12.702+0200 I -        [initandlisten] 
       
      ***aborting after fassert() failure
      

        Issue Links

          Activity

          Hide
          sam.kleinman Sam Kleinman added a comment -

          Hello,

          This looks like it might be a case of a zlib compression resolved in SERVER-17713: to help us figure out this issue could you answer the following questions:

          • Which version of MongoDB were you running when you hit this error?
          • What storage engine configurations were you using, including which compression library?

          If you were running as a member of a replica set, you can resync this member from another member of the replica set to get a valid copy of the data set.

          Cheers,
          sam

          Show
          sam.kleinman Sam Kleinman added a comment - Hello, This looks like it might be a case of a zlib compression resolved in SERVER-17713 : to help us figure out this issue could you answer the following questions: Which version of MongoDB were you running when you hit this error? What storage engine configurations were you using, including which compression library? If you were running as a member of a replica set, you can resync this member from another member of the replica set to get a valid copy of the data set. Cheers, sam
          Hide
          sombra2eternity Marcos added a comment -

          I was using 3.0.0 stable.

          config:

          root@ns364134:~# cat /etc/mongod.conf
          storage:
          dbPath: "/home/mongodb"
          journal:
          enabled: true
          engine: "wiredTiger"
          wiredTiger:
          collectionConfig:
          blockCompressor: zlib
          systemLog:
          destination: file
          path: "/var/log/mongodb/mongodb.log"
          #processManagement:

          1. fork: true
            net:
            bindIp: 127.0.0.1
            port: 27017

          Its not the case, im running single node config, mongo should be realiable enough imho.

          Show
          sombra2eternity Marcos added a comment - I was using 3.0.0 stable. config: root@ns364134:~# cat /etc/mongod.conf storage: dbPath: "/home/mongodb" journal: enabled: true engine: "wiredTiger" wiredTiger: collectionConfig: blockCompressor: zlib systemLog: destination: file path: "/var/log/mongodb/mongodb.log" #processManagement: fork: true net: bindIp: 127.0.0.1 port: 27017 Its not the case, im running single node config, mongo should be realiable enough imho.
          Hide
          sombra2eternity Marcos added a comment -

          config (markdown broke it in last comment):
          http://pastebin.com/eWNaifkq

          Hint:
          mv collection-231-1431395373171935217.wt _collection-231-1431395373171935217.wt
          clear && clear && mongod --repair --dbpath /home/mongodb --storageEngine wiredTiger
          (crash again with another crash)
          mv _collection-231-1431395373171935217.wt collection-231-1431395373171935217.wt
          clear && clear && mongod --repair --dbpath /home/mongodb --storageEngine wiredTiger
          (seems its repairing, almost it doesnt crashed)

          I see a shame mongodump stopped using --dbpath in 3.0, because if the client crash (like in this case) you have absolutely no way to dump the non-corrupted data.

          Show
          sombra2eternity Marcos added a comment - config (markdown broke it in last comment): http://pastebin.com/eWNaifkq Hint: mv collection-231- 1431395373171935217.wt _collection-231 -1431395373171935217.wt clear && clear && mongod --repair --dbpath /home/mongodb --storageEngine wiredTiger (crash again with another crash) mv _collection-231- 1431395373171935217.wt collection-231 -1431395373171935217.wt clear && clear && mongod --repair --dbpath /home/mongodb --storageEngine wiredTiger (seems its repairing, almost it doesnt crashed) I see a shame mongodump stopped using --dbpath in 3.0, because if the client crash (like in this case) you have absolutely no way to dump the non-corrupted data.
          Hide
          sam.kleinman Sam Kleinman added a comment -

          It looks like you've run into a an instance of of SERVER-17713, which was fixed in 3.0.2. We recommend in this case that you upgrade to 3.0.3 as soon as possible, and restore from backups as needed. In the future, you may want to consider having an extra copy of your data in a replica set that might make it easy to recover data in a variety of situations. We're sorry for the confusion and frustration.

          Regards,
          sam

          Show
          sam.kleinman Sam Kleinman added a comment - It looks like you've run into a an instance of of SERVER-17713 , which was fixed in 3.0.2. We recommend in this case that you upgrade to 3.0.3 as soon as possible, and restore from backups as needed. In the future, you may want to consider having an extra copy of your data in a replica set that might make it easy to recover data in a variety of situations. We're sorry for the confusion and frustration. Regards, sam
          Hide
          sombra2eternity Marcos added a comment -

          Im not sure if this should be closed, in my report i set mongodb version as 3.0.3 because the failure happen in 3.0.0 but when I found the crash the first thing I did was update mongo with last version. Then I found that once this (maybe patched) error broke the database, even the version 3.0.3 with the --repair flag crash the same way the 3.0.0 did. So this lead me to the conclusion that if the database breaks again the same way, the repair tool will be again unable to even start.

          My fix to be able to recover the database was to assume I lost the file that was crashing so I renamed it and call repair again, this time mongo crashed in another way (due to file not found but far from an clean exit, it crashed), but before this last (file not found) crash it appears it procesed almost some files. Then, frustrated, renamed again the problematic file and lastly launch repair again, this time the first crash dissapeared and was able to process all files, fortunately.

          From my pov there are 2 issues unresolved:

          • The 3.0.3 crash like the 3.0.0 if the database ends in a state like my database was in the moment I reported this issue.
          • The 3.0.3 crash if a file is not found too.

          For me a crash is unacceptable, if the program exits cleanly with a "sorry I was unable to recover collection x, I will recover the rest" its a diferent story, bus was not the case.

          Respect your recomendation of getting v3.0.3, was the first thing I did and was not the solution, now the database is running with this version that is not a stable version, that scares me for a production server. I have considered many times getting a replica, but it costs money I do not have currently. I made backups from time to time, but I expect mongo to be at least stable enough to not produce corruption by itself, I have the "ensure write" flags activated all over the place. Im very happy with how mongo works and is sad if it could not work in low environments for simple tasks without 'x' shards and 'y' replicas.

          This is my second (and diferent) crash report, and starting to think the bad name mongo has is a bit justified. I have a copy (15GB) of the broken and unrepaired database if you want to get a look.

          Thanks for your time.

          Show
          sombra2eternity Marcos added a comment - Im not sure if this should be closed, in my report i set mongodb version as 3.0.3 because the failure happen in 3.0.0 but when I found the crash the first thing I did was update mongo with last version. Then I found that once this (maybe patched) error broke the database, even the version 3.0.3 with the --repair flag crash the same way the 3.0.0 did. So this lead me to the conclusion that if the database breaks again the same way, the repair tool will be again unable to even start. My fix to be able to recover the database was to assume I lost the file that was crashing so I renamed it and call repair again, this time mongo crashed in another way (due to file not found but far from an clean exit, it crashed), but before this last (file not found) crash it appears it procesed almost some files. Then, frustrated, renamed again the problematic file and lastly launch repair again, this time the first crash dissapeared and was able to process all files, fortunately. From my pov there are 2 issues unresolved: The 3.0.3 crash like the 3.0.0 if the database ends in a state like my database was in the moment I reported this issue. The 3.0.3 crash if a file is not found too. For me a crash is unacceptable, if the program exits cleanly with a "sorry I was unable to recover collection x, I will recover the rest" its a diferent story, bus was not the case. Respect your recomendation of getting v3.0.3, was the first thing I did and was not the solution, now the database is running with this version that is not a stable version, that scares me for a production server. I have considered many times getting a replica, but it costs money I do not have currently. I made backups from time to time, but I expect mongo to be at least stable enough to not produce corruption by itself, I have the "ensure write" flags activated all over the place. Im very happy with how mongo works and is sad if it could not work in low environments for simple tasks without 'x' shards and 'y' replicas. This is my second (and diferent) crash report, and starting to think the bad name mongo has is a bit justified. I have a copy (15GB) of the broken and unrepaired database if you want to get a look. Thanks for your time.
          Hide
          sam.kleinman Sam Kleinman added a comment - - edited

          Again, I want to reiterate how sorry we are that you've run into SERVER-17713.

          While sharded deployments are not required, we do recommend a replica set as the basis of all production deployments: having an additional copy of your data provides additional assurance against machine failure, data corruption, and storage system errors. While ideally no one will run into bugs like this with compression system, there are some classes of issues that only replication can protect you from.

          Moving to 3.0.3 was definitely the correct move. The issue that you're seeing now with 3.0.3 is still a byproduct of SERVER-17713: one or more compressed pages is invalid: even though 3.0.3 will not write any new pages with invalid compression, the existing pages that were invalidly compressed with 3.0.0 still exist in your data files. If there was a valid copy of the data in another member of the replica set, you could perform an initial sync from the known good member of the set and discard the invalid data. Without a clean replica set member, the best way to get a valid data set is to restore from a backup.

          We're going to convert this issue to an ticket in the SUPPORT project, which is not publicly accessible so we can continue to work on this issue directly. If you want to upload the invalid collection file[s] to the following end point we can attempt to a manual salvage operation. Use the following method to upload your files:

          scp -P 722 [filename] SUPPORT-1332@www.mongodb.com:
          

          Thanks for your patience,
          Sam

          Show
          sam.kleinman Sam Kleinman added a comment - - edited Again, I want to reiterate how sorry we are that you've run into SERVER-17713 . While sharded deployments are not required, we do recommend a replica set as the basis of all production deployments: having an additional copy of your data provides additional assurance against machine failure, data corruption, and storage system errors. While ideally no one will run into bugs like this with compression system, there are some classes of issues that only replication can protect you from. Moving to 3.0.3 was definitely the correct move. The issue that you're seeing now with 3.0.3 is still a byproduct of SERVER-17713 : one or more compressed pages is invalid: even though 3.0.3 will not write any new pages with invalid compression, the existing pages that were invalidly compressed with 3.0.0 still exist in your data files. If there was a valid copy of the data in another member of the replica set, you could perform an initial sync from the known good member of the set and discard the invalid data. Without a clean replica set member, the best way to get a valid data set is to restore from a backup. We're going to convert this issue to an ticket in the SUPPORT project, which is not publicly accessible so we can continue to work on this issue directly. If you want to upload the invalid collection file [s] to the following end point we can attempt to a manual salvage operation. Use the following method to upload your files: scp -P 722 [filename] SUPPORT-1332@www.mongodb.com: Thanks for your patience, Sam
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Marcos, we haven't heard back from you after Sam's last comment above, so I'm going to close this ticket.

          In case this is still an issue for you I've created an upload portal so you can send us the data requested by Sam above. Files can't be larger than 5GB, so you'll need to split anything bigger than that. Alternatively you could upload the WiredTiger.wt and WiredTiger.turtle files and we can attempt to repair those first.

          Thanks,
          Ramón.

          Show
          ramon.fernandez Ramon Fernandez added a comment - Marcos , we haven't heard back from you after Sam's last comment above, so I'm going to close this ticket. In case this is still an issue for you I've created an upload portal so you can send us the data requested by Sam above. Files can't be larger than 5GB, so you'll need to split anything bigger than that. Alternatively you could upload the WiredTiger.wt and WiredTiger.turtle files and we can attempt to repair those first. Thanks, Ramón.

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: