[SERVER-14540] Error Running mongotop after large collection drop Created: 12/Jul/14  Updated: 10/Dec/14  Resolved: 21/Jul/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nick Brown Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod2-10030.rtf    
Issue Links:
Related
is related to SERVER-4473 duplicate key error on local.slaves Closed
Operating System: ALL
Participants:

 Description   

When I dropped a large collection a few days ago, I received a message that one of the shards could not be reached. I issued the command to drop it again and was told the ns did not exist. I checked all shards, and the collection was indeed gone. However, since that time, one of the shards will not run mongotop anymore. I get the following message when trying to run mongotop:

mongotop -h localhost:10030
connected to: localhost:10030
assertion: 13106 nextSafe():

{ $err: "BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit.", code: 13548 }

And from the mongod log:

Sat Jul 12 12:14:00.907 [conn1942477] Assertion: 13548:BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit.0xde46e1 0xda5e1b 0x6e7d8d 0xda627f 0xa856e6 0xa89efc 0x9fe119 0x9ff633 0x6e8518 0xdd0cae 0x7f3e413c09d1 0x7f3e40765b6d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
/usr/bin/mongod(ZN5mongo11msgassertedEiPKc+0x9b) [0xda5e1b] /usr/bin/mongod(_ZN5mongo11_BufBuilderINS_16TrivialAllocatorEE15grow_reallocateEi+0xed) [0x6e7d8d] /usr/bin/mongod(_ZNK5mongo13ExceptionInfo6appendERNS_14BSONObjBuilderEPKcS4+0x2af) [0xda627f]
/usr/bin/mongod(ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0xf6) [0xa856e6] /usr/bin/mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0xd7c) [0xa89efc] /usr/bin/mongod() [0x9fe119]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x383) [0x9ff633] /usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e8518] /usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdd0cae]
/lib64/libpthread.so.0(+0x79d1) [0x7f3e413c09d1] /lib64/libc.so.6(clone+0x6d) [0x7f3e40765b6d]Sat Jul 12 12:14:00.910 [conn1942477] assertion 13548 BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit. ns:admin.$cmd query:

{ top: 1 }

Sat Jul 12 12:14:00.910 [conn1942477] ntoskip:0 ntoreturn:1
Sat Jul 12 12:14:00.910 [conn1942477] query admin.$cmd query:

{ top: 1 }

ntoreturn:1 keyUpdates:0 exception: BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit. code:13548 reslen:117 429ms



 Comments   
Comment by Ramon Fernandez Marina [ 21/Jul/14 ]

nibrown@vivint.com, glad to hear your cluster is back to normal. Closing this ticket now.

Comment by Nick Brown [ 18/Jul/14 ]

It was intended to be both. We were somewhat concerned about our ability to run mongotop in the event we have to live-debug a performance issue with our applications. But, more than that, we were concerned that the underlying issue might be something more insidious that would have a more critical impact on our stack. The issue was only occurring on one machine, and we have a had a host of other problems with that same machine, including balancer and connection-related problems. I saw SERVER-5793 today referenced by someone as a possible source of the socket exceptions that have recently been filling up the logs on this machine. So, today we restarted the problem shard's primary mongod instance and both problems are now resolved. mongotop is working and we no longer see all of the exceptions.

Comment by Thomas Rueckstiess [ 18/Jul/14 ]

Hi Nick,

To adequately prioritize this, could you let us know the impact of this issue and what your priority is on fixing this? Was this intended more as a bug report or are you interested in a resolution for your affected shard?

Regards,
Thomas

Comment by Nick Brown [ 16/Jul/14 ]

Here are the results of the command and the attempt to run it again. I will attach a portion of the mongod log when the command was run. I assume you don't want the entire log as it is several hundred megabytes. The exception can be seen at 13:01:57 in the log.

mongos> db.adminCommand({setParameter:1,traceExceptions:true});
{ "was" : false, "ok" : 1 }
mongos> db.adminCommand("top");
{ "totals" : { "note" : "all times in microseconds" }, "ok" : 1 }
 
mongotop -h localhost:10030
connected to: localhost:10030
assertion: 13106 nextSafe(): { $err: "BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit.", code: 13548 }

Comment by Ramon Fernandez Marina [ 16/Jul/14 ]

nibrown@vivint.com, the E11000 error you posted is due to SERVER-4473, and it's documented here. This is probably unrelated to the problem with mongotop, so the first thing to do is to fix the E11000 by dropping the local.slaves collection.

Once this error has gone away, I'd like to ask you to send us full logs when running the top command again:

db.adminCommand({setParameter:1,traceExceptions:true});
db.adminCommand("top");

Comment by Nick Brown [ 15/Jul/14 ]

I've also just noticed this happening repeatedly on the same shard:

Tue Jul 15 14:45:41.006 [slaveTracking] warning: DBException thrown :: caused by :: 11000 E11000 duplicate key error index: local.slaves.$id dup key: { : ObjectId('5347e09855129acfc8f092f2') }
0xde46e1 0xda4c29 0xda5419 0xda55bc 0x7fef45 0x7ff0db 0x7ff401 0x9da77a 0x9e0fff 0x9e3625 0xac92ad 0xaca5af 0xa93e57 0xa97277 0x9fa7f8 0x9ffd78 0xa005f3 0x73129f 0x721dbd 0xc395fa
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
/usr/bin/mongod(ZN5mongo11DBException13traceIfNeededERKS0+0x129) [0xda4c29]
/usr/bin/mongod(_ZN5mongo9uassertedEiPKc+0xb9) [0xda5419]
/usr/bin/mongod() [0xda55bc]
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE4findERKNS_12IndexDetailsERKNS_5KeyV1ERKNS_7DiskLocERKNS_8OrderingERib+0x435) [0x7fef45]
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE13insertStepOneENS_7DiskLocERNS_30IndexInsertionContinuationImplIS1_EEb+0x6b) [0x7ff0db]
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE13twoStepInsertENS_7DiskLocERNS_30IndexInsertionContinuationImplIS1_EEb+0x1a1) [0x7ff401]
/usr/bin/mongod(_ZN5mongo18IndexInterfaceImplINS_12BtreeData_V1EE20beginInsertIntoIndexEiRNS_12IndexDetailsENS_7DiskLocERKNS_7BSONObjERKNS_8OrderingEb+0xda) [0x9da77a]
/usr/bin/mongod(_ZN5mongo19fetchIndexInsertersERSt3setINS_7BSONObjENS_10BSONObjCmpESaIS1_EERNS_14IndexInterface13IndexInserterEPNS_16NamespaceDetailsEiRKS1_NS_7DiskLocEb+0x2cf) [0x9e0fff]
/usr/bin/mongod(_ZN5mongo24indexRecordUsingTwoStepsEPKcPNS_16NamespaceDetailsENS_7BSONObjENS_7DiskLocEb+0x175) [0x9e3625]
/usr/bin/mongod(_ZN5mongo11DataFileMgr6insertEPKcPKvibbbPb+0x123d) [0xac92ad]
/usr/bin/mongod(_ZN5mongo11DataFileMgr16insertWithObjModEPKcRNS_7BSONObjEbb+0x4f) [0xaca5af]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x1c77) [0xa93e57]
/usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa97277]
/usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4d8) [0x9fa7f8]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xac8) [0x9ffd78]
/usr/bin/mongod(_ZN5mongo14DBDirectClient3sayERNS_7MessageEbPSs+0x83) [0xa005f3]
/usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEi+0x31f) [0x73129f]
/usr/bin/mongod(_ZN5mongo12DBClientBase6updateERKSsNS_5QueryENS_7BSONObjEbb+0x6d) [0x721dbd]
/usr/bin/mongod(_ZN5mongo13SlaveTracking3runEv+0xb6a) [0xc395fa]

Comment by Nick Brown [ 15/Jul/14 ]

rs2:PRIMARY> db.adminCommand({setParameter:1,traceExceptions:true});
{ "was" : false, "ok" : 1 }
rs2:PRIMARY> db.adminCommand("top");
Tue Jul 15 09:24:57.724 error: {
	"$err" : "BufBuilder attempted to grow() to 134217728 bytes, past the 64MB limit.",
	"code" : 13548
} at src/mongo/shell/query.js:128
rs2:PRIMARY> 

Comment by Ramon Fernandez Marina [ 15/Jul/14 ]

Thanks nibrown@vivint.com, looks like a large number of collections is not the cause of this behavior. It may be that the top command is throwing an exception with a large message. Can you please post the output of running:

db.adminCommand({setParameter:1,traceExceptions:true});
db.adminCommand("top");

in the affected shard?

Comment by Nick Brown [ 14/Jul/14 ]

{
	"db" : "history",
	"collections" : 29,
	"objects" : 250683339,
	"avgObjSize" : 208.67197493328425,
	"dataSize" : 52310587432,
	"storageSize" : 76169244672,
	"numExtents" : 493,
	"indexes" : 81,
	"indexSize" : 41217170064,
	"fileSize" : 397024428032,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"ok" : 1
}
{
	"db" : "local",
	"collections" : 11,
	"objects" : 513753409,
	"avgObjSize" : 206.90304707642338,
	"dataSize" : 106297145768,
	"storageSize" : 114527739056,
	"numExtents" : 64,
	"indexes" : 6,
	"indexSize" : 49056,
	"fileSize" : 115974602752,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"ok" : 1
}
{
	"db" : "thor",
	"collections" : 5,
	"objects" : 54180,
	"avgObjSize" : 2307.984569952012,
	"dataSize" : 125046604,
	"storageSize" : 295084032,
	"numExtents" : 23,
	"indexes" : 13,
	"indexSize" : 12958960,
	"fileSize" : 105109258240,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 6
	},
	"ok" : 1
}
{
	"db" : "admin",
	"collections" : 3,
	"objects" : 5,
	"avgObjSize" : 55.2,
	"dataSize" : 276,
	"storageSize" : 12288,
	"numExtents" : 3,
	"indexes" : 1,
	"indexSize" : 8176,
	"fileSize" : 201326592,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"ok" : 1
}
{
	"db" : "test",
	"collections" : 2,
	"objects" : 1,
	"avgObjSize" : 72,
	"dataSize" : 72,
	"storageSize" : 1056768,
	"numExtents" : 2,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 67108864,
	"nsSizeMB" : 16,
	"dataFileVersion" : {
		"major" : 4,
		"minor" : 5
	},
	"ok" : 1
}
{
	"db" : "config",
	"collections" : 0,
	"objects" : 0,
	"avgObjSize" : 0,
	"dataSize" : 0,
	"storageSize" : 0,
	"numExtents" : 0,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 0,
	"nsSizeMB" : 0,
	"dataFileVersion" : {
		
	},
	"ok" : 1
}
{
	"db" : "schedule",
	"collections" : 0,
	"objects" : 0,
	"avgObjSize" : 0,
	"dataSize" : 0,
	"storageSize" : 0,
	"numExtents" : 0,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 0,
	"nsSizeMB" : 0,
	"dataFileVersion" : {
		
	},
	"ok" : 1
}
{
	"db" : "shardtest",
	"collections" : 0,
	"objects" : 0,
	"avgObjSize" : 0,
	"dataSize" : 0,
	"storageSize" : 0,
	"numExtents" : 0,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 0,
	"nsSizeMB" : 0,
	"dataFileVersion" : {
		
	},
	"ok" : 1
}
{
	"db" : "stat",
	"collections" : 0,
	"objects" : 0,
	"avgObjSize" : 0,
	"dataSize" : 0,
	"storageSize" : 0,
	"numExtents" : 0,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 0,
	"nsSizeMB" : 0,
	"dataFileVersion" : {
		
	},
	"ok" : 1
}
{
	"db" : "weather",
	"collections" : 0,
	"objects" : 0,
	"avgObjSize" : 0,
	"dataSize" : 0,
	"storageSize" : 0,
	"numExtents" : 0,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 0,
	"nsSizeMB" : 0,
	"dataFileVersion" : {
		
	},
	"ok" : 1
}

Comment by Ramon Fernandez Marina [ 14/Jul/14 ]

nibrown@vivint.com, can you send us the output of running the following command on the affected shard?

db._adminCommand("listDatabases").databases.forEach(function (d) {mdb = db.getSiblingDB(d.name); printjson(mdb.stats()); });

Comment by Nick Brown [ 14/Jul/14 ]

We are running 2.4.9. I have not seen the error as a result of any other operation, and don't have a reason to believe it's affecting data operations at this point. But, I haven't looked thoroughly enough to rule that out yet.

Comment by Ramon Fernandez Marina [ 14/Jul/14 ]

nibrown@vivint.com, which version of MongoDB are you running? Also, is mongotop the only time you see an error? Is everything else (data operations) working?

Generated at Thu Feb 08 03:35:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.