[SERVER-17086] Can't start MongoDB w/ WiredTiger Engine because of Checksum Error Created: 28/Jan/15 Updated: 04/Jun/15 Resolved: 15/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 2.8.0-rc5, 3.0.0-rc6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Soner K | Assignee: | Bruce Lucas (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Steps To Reproduce: | 1. Do some concurrent inserts into the database |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Hi, I inserted many data via a concurrent program into a MongoDB Database (2.8.0-rc5) with the WiredEngine StorageEngine. I'm guessing the 12 GB RAM went out and the MongoDB process was killed. Before I post the log file, I want to say that I searched the jira for this issue and found some bugreports on this:
So my issue seems to be the checksum error. Is there any way for me to change the checksum number? I tried to open the .wt files in the dbpath with a hex editor but due to the huge size of the file (~228 GiB) I don't even know where to begin searching. The hex addresses in the error log below don't help either. I'd appreciate any help, comment or hint Here's the log file on trying to start mongod with the WiredTiger StorageEngine. The exact command is:
I changed the hostname and pathname in the log below manually.
|
| Comments |
| Comment by Ramon Fernandez Marina [ 28/May/15 ] | |||||||||||||||||||||
|
hannes_brt, in order to avoid confusion can you please open a separate ticket for your issue? Please include the full log and the exact version you're running, as well as details about your setup and any other information you think may help us track down this problem. Thanks, | |||||||||||||||||||||
| Comment by Hannes Bretschneider [ 28/May/15 ] | |||||||||||||||||||||
|
The same issue seems to be happening to me. I was running a job that had 31 threads writing simultaneously to MongoDB. The job finished successfully, but after the following re-start, mongod now no longer starts up. Here is the log: 2015-05-28T13:37:20.738-0400 I CONTROL ***** SERVER RESTARTED ***** , {"b":"400000","o":"AF1671"}, {"b":"400000","o":"AD6261"}, {"b":"400000","o":"97B2BA"}, {"b":"400000","o":"F816C9"}, {"b":"400000","o":"F81885"}, {"b":"400000","o":"F81D24"}, {"b":"400000","o":"ED578E"}, {"b":"400000","o":"ED5C28"}, {"b":"400000","o":"ED2FE3"}, {"b":"400000","o":"ED6956"}, {"b":"400000","o":"EEF431"}, {"b":"400000","o":"F17A0B"}, {"b":"400000","o":"F80B43"}, {"b":"400000","o":"F4EAFB"}, {"b":"400000","o":"F150F7"}, {"b":"400000","o":"965C2B"}, {"b":"400000","o":"963C28"}, {"b":"400000","o":"681DBD"}, {"b":"400000","o":"408712"}, {"b":"400000","o":"3D5334"}, {"b":"7F25793F4000","o":"21EC5"}, {"b":"400000","o":"4064D7"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105", "uname" : { "sysname" : "Linux", "release" : "3.13.0-53-generic", "version" : "#89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F56F80CB96B4DBFC070BEB0ADAC7D6B274BFC6B1" }, { "b" : "7FFF250AF000", "elfType" : 3, "buildId" : "E76682F06460BD7F01BD69179E0ADC594943C298" }, { "b" : "7F257AA1E000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F257A7C0000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F257A3E5000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B927879B878D90DD9FF4B15B00E7799AA8E0272F" }, { "b" : "7F257A1DD000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F2579FD9000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F2579CD5000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F25799CF000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F25797B9000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F25793F4000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F257AC3C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }} ***aborting after fassert() failure | |||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 15/May/15 ] | |||||||||||||||||||||
|
Hi soner, unfortunately we have not been able to reproduce this ticket. There's been a significant number of improvements in MongoDB and WiredTiger since this issue was first reported, and the 2.8/3.0 release candidates are no longer relevant nor recommended, so I'd like to ask you to upgrade to version 3.0.3 and report back. I'm going to resolve this ticket, but if after upgrading to 3.0.3 you continue to observe the same behavior please feel free to reopen this ticket. Regards, | |||||||||||||||||||||
| Comment by Soner K [ 05/Mar/15 ] | |||||||||||||||||||||
|
Ok here are more details on what I did and what happened exactly: I needed a large collection of generated documents which have a specific structure with semi-random content (choose a random item from a set of items for each document). I wrote a Java-tool which generates these random documents. To make things faster I ran 1000 Threads which each inserts 1'000'000 documents. In the end I would have one billion documents. It didn't work quite well because the MongoDB server process crashed pretty often and I had to restart the MongoDB server (and the Java-tool for inserting the documents). Here's an example of the JSON-structure:
I'd like to upload the JSON-file but it might contain sensitive data. So if you need a specific JSON to test, I might give it to you in private. I had three indexes on keys containing integers (like { "aNumber" : 5}). Thank you for looking into this! | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 04/Mar/15 ] | |||||||||||||||||||||
|
Thanks Soner. Those files confirm that it appears that some data was not properly written to disk. I have been running a repro attempt that writes some data and kills mongod with SIGKILL for about 24 hours, through 900 or so server crashes, and have not reproduced a problem. I will try modifying the repro attempt to make it more write intensive and try some more. Can you say a little bit about the kind of data you were inserting? How many threads, how many indexes, what size documents? | |||||||||||||||||||||
| Comment by Soner K [ 03/Mar/15 ] | |||||||||||||||||||||
|
Hi Bruce, I've uploaded the two files you requested. Only the mongod process was killed, the system was ok. This happened a few times before but I could start the mongod process afterwards. Only this one time it wouldn't start. Thank you, | |||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 03/Mar/15 ] | |||||||||||||||||||||
|
Thanks Soner. That file in itself appears to be ok, so I think the problem is a mismatch between the content of that file and one of the other WiredTiger metadata files that points to it. Would you be able to also attach the following two metadata files to the ticket: WiredTiger.turtle and WiredTiger.wt so I can check my theory? Also, can you clarify whether the node crashed prior to this problem occurring, or only the mongod process was killed? Thanks, | |||||||||||||||||||||
| Comment by Soner K [ 19/Feb/15 ] | |||||||||||||||||||||
|
Hi, So I attached the sizeStorer.wt | |||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 31/Jan/15 ] | |||||||||||||||||||||
|
Would you be able to upload and attach the sizeStorer.wt file from your dbpath to this ticket for examination? |