[SERVER-5489] Mongod crash Created: 03/Apr/12  Updated: 15/Aug/12  Resolved: 19/May/12

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Internal Code, Stability
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File jeff_digg.tgz     File mongodb-daemon.log.gz    
Operating System: Linux
Participants:

 Description   

Getting repeatable crashes after inserting documents from a find().snapshot() then calling ensureIndex on the _id field (that should already exist?)



 Comments   
Comment by Tad Marshall [ 25/Jun/12 ]

Setting Backport to No because there is no code change associated with this ticket.

Comment by Andy Schwerin [ 19/May/12 ]

Appears to be data corruption unrelated to mongo operation.

Comment by Daniel Gottlieb (Inactive) [ 19/May/12 ]

Nope. I was just creating a case in the event the error message/data was at all meaningful. Seems this failure is pretty generic and merely indicates that somewhere data got corrupted. Feel free to resolve/close as protocol dictates.

Comment by Andy Schwerin [ 18/May/12 ]

This still burning you?

Comment by Daniel Gottlieb (Inactive) [ 20/Apr/12 ]

From your comment it seems pretty likely the error is that the data files are corrupt. I don't think there's any real trails that might help explain how it got to there (whether or not it was human error). A further mention, my version of mongod:

db version v2.1.0, pdfile version 4.5
Fri Apr 20 14:12:21 git version: d674c681170337e3dfc34ae796b06fdde5ac05dd

won't even start up as it fails to read the .ns file (and throws the same assert as above as far as I can tell). Whichever version of mongod Jeff is running seems to care less about the broken .ns file (from what I remember, it only complains about a problem when doing a find on the system.indexes collection for him).

Comment by Daniel Gottlieb (Inactive) [ 20/Apr/12 ]

This is a tar of mongod database files. Just untar them and start a mongod with --dbpath pointing to this directory.

Comment by Andy Schwerin [ 18/Apr/12 ]

Yeah, this particular assertion indicates that some region of memory that we believe is a database file doesn't have the expected magic number at the front. Could mean wild pointer, could mean corrupt file

Comment by Daniel Gottlieb (Inactive) [ 18/Apr/12 ]

Damn, last time the log/stack trace seemed to be enough! I'll see if I can find a dataset for reproduction.

Comment by Andy Schwerin [ 18/Apr/12 ]

Apparently relevant section of log file:

 
Tue Apr  3 17:35:29 [conn1] runQuery called digg.system.indexes { name: "_id_1", ns: "digg.fs.chunks", key: { _id: 1 } }
Tue Apr  3 17:35:29 [conn1]  digg.system.indexes Assertion failure isOk() db/pdfile.h 300
0x57a396 0x5851bb 0x8a780a 0x8a7bb6 0x8cefe1 0x94c0cd 0x8cb95a 0x8cbd73 0x8cc27c 0x8d5a0e 0x8dd49e 0x8deeb3 0x8e0187 0x943dd7 0x8866b7 0x88dc29 0xaa33f6 0x637407 0x7f17ab1847f1 0x7f17aa72192d 
 /data/dist/current/bin/mongod(_ZN5mongo12sayDbContextEPKc+0x96) [0x57a396]
 /data/dist/current/bin/mongod(_ZN5mongo8assertedEPKcS1_j+0xfb) [0x5851bb]
 /data/dist/current/bin/mongod(_ZN5mongo11DataFileMgr7findAllEPKcRKNS_7DiskLocE+0x5fa) [0x8a780a]
 /data/dist/current/bin/mongod(_ZN5mongo13findTableScanEPKcRKNS_7BSONObjERKNS_7DiskLocE+0xe6) [0x8a7bb6]
 /data/dist/current/bin/mongod(_ZNK5mongo9QueryPlan9newCursorERKNS_7DiskLocEi+0x3a1) [0x8cefe1]
 /data/dist/current/bin/mongod(_ZN5mongo11UserQueryOp5_initEv+0x71d) [0x94c0cd]
 /data/dist/current/bin/mongod(_ZN5mongo7QueryOp4initEv+0x11a) [0x8cb95a]
 /data/dist/current/bin/mongod(_ZN5mongo12QueryPlanSet6Runner6initOpERNS_7QueryOpE+0x23) [0x8cbd73]
 /data/dist/current/bin/mongod(_ZN5mongo12QueryPlanSet6Runner4initEv+0x2ec) [0x8cc27c]
 /data/dist/current/bin/mongod(_ZN5mongo12QueryPlanSet6Runner22runUntilFirstCompletesEv+0x1e) [0x8d5a0e]
 /data/dist/current/bin/mongod(_ZN5mongo12QueryPlanSet5runOpERNS_7QueryOpE+0x11e) [0x8dd49e]
 /data/dist/current/bin/mongod(_ZN5mongo16MultiPlanScanner9runOpOnceERNS_7QueryOpE+0x523) [0x8deeb3]
 /data/dist/current/bin/mongod(_ZN5mongo16MultiPlanScanner5runOpERNS_7QueryOpE+0x17) [0x8e0187]
 /data/dist/current/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0xf87) [0x943dd7]
 /data/dist/current/bin/mongod() [0x8866b7]
 /data/dist/current/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x559) [0x88dc29]
 /data/dist/current/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x76) [0xaa33f6]
 /data/dist/current/bin/mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x637407]
 /lib64/libpthread.so.0(+0x77f1) [0x7f17ab1847f1]
 /lib64/libc.so.6(clone+0x6d) [0x7f17aa72192d]
Tue Apr  3 17:35:29 [conn1] assertion 0 assertion db/pdfile.h:300 ns:digg.system.indexes query:{ name: "_id_1", ns: "digg.fs.chunks", key: { _id: 1 } }
Tue Apr  3 17:35:29 [conn1]  ntoskip:0 ntoreturn:-1
Tue Apr  3 17:35:29 [conn1] query digg.system.indexes query: { name: "_id_1", ns: "digg.fs.chunks", key: { _id: 1 } } ntoreturn:1 exception: assertion db/pdfile.h:300 reslen:61 4ms

Comment by Andy Schwerin [ 18/Apr/12 ]

Dan,

Do you have a set of (ideally simple) repro instructions? Preferably of the form "run attached script," but anything you can give us.

There's not much context in this bug report, right now.

Generated at Thu Feb 08 03:09:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.