[SERVER-1724] Repair halts with segmentation fault Created: 03/Sep/10  Updated: 12/Jul/16  Resolved: 03/Sep/10

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.5.6, 1.6.2
Fix Version/s: 1.7.0

Type: Bug Priority: Major - P3
Reporter: Leonardo Diez Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 5.5
kernel 2.6.33.5-xxxx-grs-ipv4-64


Attachments: File mongo-repair-segfault.log    
Issue Links:
Related
is related to SERVER-2149 BSONObj::valid and BSONElement::valid... Closed
Operating System: Linux
Participants:

 Description   

Working with 1.5.6 version and after a data import and some data manipulation on a 40 million collection, the database its corrupt. It seems that the data can be accessed normally, but when trying to run some data processing using a java process, an error occurs and ask for a database repair.
When executing the db repair, the process run normally, until it reach about 8 million objects cloned, when a segmentation fault occurs. The backtrace seems to point that the error occurs on the validate method of BSONElement class.
I've tried to run the same repair process with 1.6.2 version, getting similar results. The log attached was obtained as one of these tries.



 Comments   
Comment by Dwight Merriman [ 25/Nov/10 ]

i have made a new generalized ticket for clarity SERVER-2149

Comment by Leonardo Diez [ 08/Sep/10 ]

I've solved the problem temporary changing the limit to 4M to run the repair process, because I know all my string data doesn't reach that size. Maybe there is a good idea to use a SIGSEV handler to avoid segmentation fault during data validation.

Comment by Leonardo Diez [ 08/Sep/10 ]

I've applied your changes to 1.6.2 code, compiled it and tried to execute a db reparation again. The problem keeps there. The backtrace line changes to: /opt/mongo/bin/mongod(_ZNK5mongo11BSONElement8validateEv+0x6d).
I've added some debug info prints to find out which are the common values for x, because your modification verifies that it is lower than 32M = 33554432. I found that values for x are usually lower than 255, but before the segmentation fault occurs x is 3840 (doesn't seems to be a problem) and the next time is 7602689, and then the segmentation fault occurs. So, x is a valid value, but it's not correct and a segmentation fault occurs when it tries to see if valuestr()[x-1] is 0. There's any way to avoid the problem?

Comment by auto [ 03/Sep/10 ]

Author:

{'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}

Message: try to prevent segfault on corrupt data SERVER-1724
http://github.com/mongodb/mongo/commit/1c0d685f9a22c8e4acba957dba99feda69bc12be

Generated at Thu Feb 08 02:57:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.