[SERVER-17654] Crash/Exception while performing initial sync of secondary, while building a 110 Mil. docs index Created: 19/Mar/15 Updated: 04/Jun/15 Resolved: 21/Apr/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Borut Hadzialic | Assignee: | Michael Cahill (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
|
||
| Issue Links: |
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Steps To Reproduce: | Not sure if reproducible, what we did was. 1. Convert a 1TB large MongoDb(3.0/WiredTiger/zlib) database to replica set primary with rs.initiate() |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
The replica set secondary crashed during initial sync, in the build index step, while building the index on a 110 Mil. docs large collection.
|
| Comments |
| Comment by Ramon Fernandez Marina [ 21/Apr/15 ] | ||||||||||||||||||||||||||||||||||||
|
bhcoba, it looks like this ticket is a duplicate of I'm thus going to resolve this ticket, but if if during your experiments in the coming months you encounter the issue again please let us know. Regards, | ||||||||||||||||||||||||||||||||||||
| Comment by Borut Hadzialic [ 23/Mar/15 ] | ||||||||||||||||||||||||||||||||||||
|
The fresh initial sync succeeded this time with 3.0.1. @Dan
The server is a development server where we test Mongo (and some of its competitors/forks, but at any give time only 1 database type / process is running on the server). In the last 4 months another product (a mongodb fork) was tested heavily on the server and there was no indication that the server was faulty somehow - everything worked pretty well. We will repeat the initial sync procedure many times in the upcoming months - I will post again if we encounter the same issue. | ||||||||||||||||||||||||||||||||||||
| Comment by Michael Cahill (Inactive) [ 23/Mar/15 ] | ||||||||||||||||||||||||||||||||||||
|
bhcoba, if you are able to try again, my recommendation would be to start with a fresh replica and try running:
By default, when compression is enabled, WiredTiger checksums each block header, and relies on compression to detect corruption. The above command line will calculate checksums for all blocks including compressed blocks, so if the failure is being caused by corruption, this should catch it sooner. | ||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 21/Mar/15 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Borut,
| ||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 20/Mar/15 ] | ||||||||||||||||||||||||||||||||||||
|
Thanks for your report bhcoba, we're looking into it. Trying to reproduce on 3.0.1 is indeed the first step, so please let us know how that goes. When we know more about the issue we'll let you know if a separate ticket is needed – for now let's see what 3.0.1 does. | ||||||||||||||||||||||||||||||||||||
| Comment by Borut Hadzialic [ 20/Mar/15 ] | ||||||||||||||||||||||||||||||||||||
|
Problem #2:
As MongoDB 3.0.1 is available, I will upgrade our replica set to from 3.0.0 to 3.0.1, and try to perform the initial sync to a fresh/empty 3.0.1 secondary again. | ||||||||||||||||||||||||||||||||||||
| Comment by Borut Hadzialic [ 19/Mar/15 ] | ||||||||||||||||||||||||||||||||||||
|
Starting the secondary again made it try to build the index again, but then failed at same point:
I will drop the secondary data files and add it to the replica set again, to see if a fresh initial sync will work.. |