[SERVER-2783] Traceback of crash in mongod Created: 17/Mar/11  Updated: 30/Mar/12  Resolved: 14/Jun/11

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 1.8.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Vince Busam Assignee: Aaron Staple
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 9.10, 64bit, 10gen packages


Attachments: File fullmongo.log.gz     File mongocrash     File mongocrash2    
Operating System: ALL
Participants:

 Description   

Two separate continuous chunks of the logfile with tracebacks after mongod (as a shard) crashed.



 Comments   
Comment by Aaron Staple [ 14/Jun/11 ]

Hi Vince,

Are there any pending issues for this ticket?

Comment by Aaron Staple [ 29/Mar/11 ]

Glad you were able to regenerate the collection. Just fyi, the mongodump issue could potentially be related to the bad index unless you forced deletion of your _id indexes.

Comment by Vince Busam [ 29/Mar/11 ]

I dumped and re-created the collection on the shard, which solved this. mongod/mongodump crashed during dumping the collection (after the indexes were dropped), so it may have been corrupt database files.

Comment by Aaron Staple [ 22/Mar/11 ]

I'm seeing a few different types of stack traces in the most recently supplied log, all of which may be caused by a bad index entry. The bad index could have resulted from a bad shutdown or a bug in 1.6 related to large keys that is sometimes but not always accompanied by log messages indicating 'key too large' in the log file. I would recommend rebuilding your indexes.

Comment by Aaron Staple [ 22/Mar/11 ]

Have you had any crashes on this system besides those in the log file? (You mentioned that you reimported your data once.)

Comment by Aaron Staple [ 22/Mar/11 ]

Sorry, from log it looks like no durability.

Comment by Aaron Staple [ 22/Mar/11 ]

Ok thanks. Are you running with durability?

Comment by Vince Busam [ 22/Mar/11 ]

Oops, bad grep. Here's the full log.

Comment by Vince Busam [ 22/Mar/11 ]

Fixed full log.

Comment by Aaron Staple [ 22/Mar/11 ]

Hi Vince - the log that was sent doesn't seem to contain actual backtraces in it. Where did the backtraces in the log excerpts you initially included in this bug report come from?

Comment by Vince Busam [ 21/Mar/11 ]

Full log from 2 days before crash.

Comment by Vince Busam [ 21/Mar/11 ]
  • Do your logs go back to the start of these data files? Any mention of "key too large" - not necessarily with unindex

They go back to installing mongo. No mention of key too large.

  • Would it be possible to send a few days worth of log from around when the error occurred?

I'll cut out a few days and upload the file in a bit.

  • Were all the indexes created before data was inserted in the collection, or were some indexes created after some data was present?

Originally, they were created with background=true after data was present, but then the whole collection was dumped and re-imported, and the indexes were created before the import.

  • Are you sure you aren't doing any count queries other than those mentioned?

I found another one that adds a geocoords box to the other count():
Mon Mar 21 13:47:26 [conn10] query users.$cmd ntoreturn:1 command: { count: "globalusers", query: { geocoords.lng:

{ $lte: -89.47265625, $gte: -113.115234375 }

, instanceslist: 33839, geocoords.lat:

{ $lte: 41.44272637767212, $gt e: 21.45306863308678 }

} } reslen:64 283ms

  • Is your shard key {lowerscreenname:1,service:1}

    ?

Yes.

Comment by Aaron Staple [ 21/Mar/11 ]

Thanks for the replies. Just a few followup questions:

  • Do your logs go back to the start of these data files? Any mention of "key too large" - not necessarily with unindex
  • Would it be possible to send a few days worth of log from around when the error occurred?
  • Were all the indexes created before data was inserted in the collection, or were some indexes created after some data was present?
  • Are you sure you aren't doing any count queries other than those mentioned?
  • Is your shard key {lowerscreenname:1,service:1}

    ?

Comment by Vince Busam [ 21/Mar/11 ]

The database and indexes were created in 1.6.5. Following are the stats (with indexes) and an example document.

Our count operations are:
db.globalusers.count()
db.globalusers.find(

{"instanceslist":x}

)

No sign of "unindex: key too large" in the logs.

> db.globalusers.stats()
{
"sharded" : true,
"ns" : "users.globalusers",
"count" : 14434052,
"size" : 23402630664,
"avgObjSize" : 1621.3486458272423,
"storageSize" : 28600703456,
"nindexes" : 4,
"nchunks" : 104,
"shards" : {
"shard0000" : {
"ns" : "users.globalusers",
"count" : 9289516,
"size" : 13497343164,
"avgObjSize" : 1452.9651667535747,
"storageSize" : 17619024096,
"numExtents" : 35,
"nindexes" : 4,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1.2299999998967497,
"flags" : 1,
"totalIndexSize" : 3421613824,
"indexSizes" :

{ "_id_" : 683918272, "lowerscreenname_1_service_1" : 824935360, "service_1_needsprocessing_1" : 928441280, "instanceslist_1" : 984318912 }

,
"ok" : 1
},
"shard0001" : {
"ns" : "users.globalusers",
"count" : 5144536,
"size" : 9905287500,
"avgObjSize" : 1925.399589000835,
"storageSize" : 10981679360,
"numExtents" : 34,
"nindexes" : 4,
"lastExtentSize" : 1837509120,
"paddingFactor" : 1.5699999999325418,
"flags" : 1,
"totalIndexSize" : 1733029632,
"indexSizes" :

{ "_id_" : 297632704, "lowerscreenname_1_service_1" : 450642880, "service_1_needsprocessing_1" : 481649600, "instanceslist_1" : 503104448 }

,
"ok" : 1
}
},
"ok" : 1
}

> db.globalusers.findOne(

{"lowerscreenname":"vincebusam"}

)
{
"_id" : ObjectId("4d683dcc02b21f5f76000000"),
"agegroup" : "25-34",
"enthusiastScore" : {

},
"followers" : 66,
"friends" : 35,
"fullname" : "Vincent Busam",
"gender" : "male",
"instances" : {
"32472" : {

},
"33276" : {

}
},
"instanceslist" : [
32472,
33276
],
"location" : "Los Angeles",
"lowerscreenname" : "vincebusam",
"message" :

{ "searchterm" : "@citizennet", "polarity" : 0.5900284867557231, "messagetext" : "RT @citizennet: The re- #launch of @citizennet is live! Go check it out and turn your data into gold baby (pls RT) http://skotz.co/dOJal8", "intent" : null, "instance" : 32472 }

,
"messagesInFeed" :

{ "32472" : 1, "33276" : 1 }

,
"names" : {

},
"needsprocessing" : [ ],
"postedFromFeed" :

{ "32472" : 0 }

,
"processed" :

{ "geo" : true, "names" : false, "profile" : true, "timeline" : true }

,
"profileimage" : "http://a1.twimg.com/profile_images/1024079293/vincebusam_normal.jpg",
"screenname" : "vincebusam",
"service" : "twitter",
"totalMessagesConvDaily" : 0,
"totalMessagesSelfDaily" : 0,
"twitterMessageRateContDaily" : 0,
"twitterMessageRateConvDaily" : 0,
"updatedat" : "2011-02-25 15:39:56",
"userClass" : "people",
"userid" : 26426925
}

Comment by Vince Busam [ 21/Mar/11 ]

addr2line of mongocrash:

  1. addr2line -e mongodb-linux-x86_64-debugsymbols-1.8.0/bin/mongod 0x8a4c29 0x7ffe862ab530 0x73ecb8 0x65b792 0x78e566 0x797091 0x79a61b 0x79cd44 0x79d8e7 0x64d04b 0x7f278d 0x7dc300 0x7dd831 0x645405 0x649941 0x757ad5 0x75a000 0x8a80ae 0x8b84f0 0x7ffe86d99a04
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/db.cpp:1159
    ??:0
    /opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
    /opt/extra/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:143
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/queryoptimizer.cpp:659
    /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_heap.h:404
    /opt/extra/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:50
    /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:1262
    /opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
    /opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/../util/../util/../db/../bson/bsonelement.h:524
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/../util/../util/../db/../bson/bsonobjbuilder.h:372
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/dbcommands.cpp:1859
    /opt/extra/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:159
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/../util/../util/../db/../bson/bson-inl.h:573
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/repl/rs_config.h:29
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/curop.h:55
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/db.cpp:312
    ??:0
    ??:0

From mongocrash2:

  1. addr2line -e mongodb-linux-x86_64-debugsymbols-1.8.0/bin/mongod x8a4c29 0x7ffe862ab530 0x73ecb8 0x65b792 0x78e566 0x797091 0x79a61b 0x79cd44 0x79d8e7 0x64d04b 0x7f278d 0x7dc300 0x7dd831 0x645405 0x649941 0x757ad5 0x75a000 0x8a80ae 0x8b84f0 0x7ffe86d99a04
    ??:0
    ??:0
    /opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
    /opt/extra/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:143
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/queryoptimizer.cpp:659
    /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_heap.h:404
    /opt/extra/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:50
    /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_tree.h:1262
    /opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
    /opt/extra/include/boost/smart_ptr/shared_ptr.hpp:418
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/../util/../util/../db/../bson/bsonelement.h:524
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/../util/../util/../db/../bson/bsonobjbuilder.h:372
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/dbcommands.cpp:1859
    /opt/extra/include/boost/smart_ptr/detail/sp_counted_base_gcc_x86.hpp:159
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/../util/../util/../db/../bson/bson-inl.h:573
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/repl/rs_config.h:29
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/curop.h:55
    /mnt/home/buildbot/slave/Linux_64bit_V1.8/mongo/db/db.cpp:312
    ??:0
    ??:0
Comment by Aaron Staple [ 21/Mar/11 ]

Vince - also do you have any messages "unindex: key too large" in the logs?

Comment by Aaron Staple [ 21/Mar/11 ]

Hi Vince - were you using data files and indexes from an earlier mongo version? Can you send your indexes and an example document for this collection as well as your count query? Thanks.

Comment by Aaron Staple [ 19/Mar/11 ]

Hi Vince - could you let us know the count queries you are sending?

Comment by Eliot Horowitz (Inactive) [ 19/Mar/11 ]

Can't right now but you can get the symbols from: http://downloads.mongodb.org/linux/mongodb-linux-x86_64-debugsymbols-1.8.0.tgz and then use addr2line

Generated at Thu Feb 08 03:01:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.