[SERVER-13856] MongoDB crash on addition to a replica set Created: 07/May/14  Updated: 24/Jan/15  Resolved: 23/Jan/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Matt Dannenberg
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log    
Operating System: ALL
Participants:

 Description   

I added a new server to an existing replica set and the mongod instance on the new replica set hit the following assert and stopped. I am provide the full log if needed. The version is 2.4.8

Wed May 7 02:08:56.572 [conn74760] authenticate db: data

{ authenticate: 1, user: "admin", nonce: "b72c73378e4a2ed4", key: "ab9cb1fa033cbe6cee07d97661ac6507" }

Wed May 7 02:08:56.572 [conn74767] end connection 162.243.83.99:42063 (276 connections now open)
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xc18f56]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0x63) [0xbdfda3]
/usr/bin/mongod(_ZN5mongo7replset21multiInitialSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x164) [0xad9824]
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x41c) [0xbec29c]
/usr/bin/mongod() [0xc5257c]
/lib64/libpthread.so.0(+0x7851) [0x7f1f33a86851]
/lib64/libc.so.6(clone+0x6d) [0x7f1f327e090d]
Wed May 7 02:08:56.572 [conn74768] end connection 162.243.122.196:59338 (276 connections now open)
Wed May 7 02:08:56.583 [repl writer worker 1]

***aborting after fassert() failure



 Comments   
Comment by Thomas Rueckstiess [ 24/Jul/14 ]

Hi Dharshan,

have you had a chance to read Matt's last comment? If this is still an issue for you, can you answer his questions regarding the updates?

Regards,
Thomas

Comment by Matt Dannenberg [ 09/Jul/14 ]

Thanks. Your indexes do not look like they are the cause of the trouble. Do you perform any updates on the documents in the capped "cache.metrics" collection? If so, what changes are made and to which fields?

Comment by Dharshan Rangegowda [ 11/Jun/14 ]

Here you go
RS-ads0-0:PRIMARY> db.metrics.getIndexes();

[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "cache.metrics",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "k" : NumberLong(1)
                },
                "ns" : "cache.metrics",
                "background" : NumberLong(1),
                "name" : "k"
        }
]

Comment by Ramon Fernandez Marina [ 10/Jun/14 ]

Hi dharshanr@scalegrid.net,

we still need more information to make progress on this ticket. If this is still an issue for you, can you please send us the information Matt requested above?

Thanks,
Ramón.

Comment by Matt Dannenberg [ 28/May/14 ]

Hey Dharshan,

Could you please post the indexes which you have on the "cache.metrics" collection (result of db.cache.metrics.getIndexes())?

Thanks,
Matt

Comment by Dharshan Rangegowda [ 25/May/14 ]

Here you go

RS-ads0-0:PRIMARY> db.system.namespaces.find( { name:"cache.metrics" }, { "options.size": 1 } )
{ "options" : { "size" : NumberLong(1000000000) } }

Comment by Andy Schwerin [ 08/May/14 ]

dharshanr@scalegrid.net, could you also report the capacity limit of the "cache.metrics" collection?

Comment by Andy Schwerin [ 08/May/14 ]

The relevant log lines are the following:

Wed May  7 02:08:53.158 [repl writer worker 1] ERROR: exception: failing update: objects in a capped ns cannot grow on: { ts: Timestamp 1399427389000|234, h: 4961172338724358588, v: 2, op: "i", ns: "cache.metrics", o: { _id: ObjectId('5369913d74ef7e3abb03cef3'),  ... } }
Wed May  7 02:08:53.731 [repl writer worker 1]   Fatal Assertion 16361

dharshanr@scalegrid.net, can you please report what indexes you have on the "cache.metrics" collection?

Comment by Dharshan Rangegowda [ 08/May/14 ]

Here you go - Full log attached.

Comment by Andy Schwerin [ 07/May/14 ]

dharshanr@scalegrid.net, somewhere within 20-30 lines above what you posted from your logs should be a line that says something like "fassert failed" and a number. Can you paste the logs from there to the end?

Generated at Thu Feb 08 03:33:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.