[SERVER-9168] heap_usage_bytes incorrect in 2.4 Created: 28/Mar/13  Updated: 16/Nov/21  Resolved: 29/Mar/13

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Antoine Girbal Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux


Operating System: ALL
Participants:

 Description   

Just seen a mongo environment recently upgraded to 2.4.1 where the "heap_usage_bytes" in serverStatus() was completely wrong.
The value would increase continuously, eventually go back to 0, apparently at 12am on the dot.
At the time of reading the value was 1.8GB.
In comparison the value was constant 200MB with v2.2.



 Comments   
Comment by Daniel Pasette (Inactive) [ 29/Mar/13 ]

The heap_usage_bytes is calculated using a signed 32-bit integer, which means that the value can overflow, resulting in negative values. This is what is occurring in this case.

See http://code.google.com/p/gperftools/issues/detail?id=224&redir=1 about mallinfo() with tcmalloc.

As http://kobesearch.cpan.org/htdocs/Devel-Mallinfo/Devel/Mallinfo.pm.html says:
"On a 64-bit system with a 32-bit C int type, the int fields in struct mallinfo may
overflow and either wrap around to small or negative values, or hopefully cap at
INT_MAX. This is a known C library problem and Devel::Mallinfo doesn't try to do
anything about it."

Comment by Tad Marshall [ 29/Mar/13 ]

Interesting comment in the source ([dm]):
src/mongo/util/processinfo_linux2.cpp lines 376 to 384 in master:

    void ProcessInfo::getExtraInfo( BSONObjBuilder& info ) {
        // [dm] i don't think mallinfo works. (64 bit.)  ??
        struct mallinfo malloc_info = mallinfo(); // structure has same name as function that returns it. (see malloc.h)
        info.append("heap_usage_bytes", malloc_info.uordblks/*main arena*/ + malloc_info.hblkhd/*mmap blocks*/);
        //docs claim hblkhd is included in uordblks but it isn't
 
        LinuxProc p(_pid);
        info.appendNumber("page_faults", static_cast<long long>(p._maj_flt) );
    }

Also, see http://code.google.com/p/gperftools/issues/detail?id=224&redir=1 about mallinfo() with tcmalloc.

As http://kobesearch.cpan.org/htdocs/Devel-Mallinfo/Devel/Mallinfo.pm.html says:
"On a 64-bit system with a 32-bit C int type, the int fields in struct mallinfo may
overflow and either wrap around to small or negative values, or hopefully cap at
INT_MAX. This is a known C library problem and Devel::Mallinfo doesn't try to do
anything about it."

Comment by Samuel Clay [ 28/Mar/13 ]

Platform: Ubuntu 12.10 64-bit

Comment by Samuel Clay [ 28/Mar/13 ]

Here's a graphical representation of my heap size: http://cl.ly/NtrJ

Comment by Samuel Clay [ 28/Mar/13 ]

I can offer you ssh access if you like (send me your public key: samuel@newsblur.com). Here's my db.stats() && db.serverStatus():

    nbset:PRIMARY> db.stats()
    {
    	"db" : "newsblur",
    	"collections" : 29,
    	"objects" : 78705123,
    	"avgObjSize" : 1139.8257345967174,
    	"dataSize" : 89710124640,
    	"storageSize" : 93400137184,
    	"numExtents" : 281,
    	"indexes" : 115,
    	"indexSize" : 13492239600,
    	"fileSize" : 113694998528,
    	"nsSizeMB" : 16,
    	"dataFileVersion" : {
    		"major" : 4,
    		"minor" : 5
    	},
    	"ok" : 1
    }
    nbset:PRIMARY> db.serverStatus()
    {
    	"host" : "db24",
    	"version" : "2.4.1",
    	"process" : "mongod",
    	"pid" : 23296,
    	"uptime" : 248116,
    	"uptimeMillis" : NumberLong(248115733),
    	"uptimeEstimate" : 243107,
    	"localTime" : ISODate("2013-03-28T22:34:56.177Z"),
    	"asserts" : {
    		"regular" : 0,
    		"warning" : 0,
    		"msg" : 0,
    		"user" : 2976905,
    		"rollovers" : 0
    	},
    	"backgroundFlushing" : {
    		"flushes" : 4135,
    		"total_ms" : 1280521,
    		"average_ms" : 309.67859733978236,
    		"last_ms" : 744,
    		"last_finished" : ISODate("2013-03-28T22:34:42.156Z")
    	},
    	"connections" : {
    		"current" : 755,
    		"available" : 7245,
    		"totalCreated" : NumberLong(580615)
    	},
    	"cursors" : {
    		"totalOpen" : 8,
    		"clientCursors_size" : 8,
    		"timedOut" : 127
    	},
    	"dur" : {
    		"commits" : 27,
    		"journaledMB" : 4.194304,
    		"writeToDataFilesMB" : 7.873008,
    		"compression" : 0.521561991058484,
    		"commitsInWriteLock" : 0,
    		"earlyCommits" : 0,
    		"timeMs" : {
    			"dt" : 3054,
    			"prepLogBuffer" : 12,
    			"writeToJournal" : 175,
    			"writeToDataFiles" : 59,
    			"remapPrivateView" : 7
    		}
    	},
    	"extra_info" : {
    		"note" : "fields vary by platform",
    		"heap_usage_bytes" : 1763604856,
    		"page_faults" : 442529
    	},
    	"globalLock" : {
    		"totalTime" : NumberLong("248115735000"),
    		"lockTime" : NumberLong("2481468915"),
    		"currentQueue" : {
    			"total" : 5,
    			"readers" : 5,
    			"writers" : 0
    		},
    		"activeClients" : {
    			"total" : 2,
    			"readers" : 0,
    			"writers" : 2
    		}
    	},
    	"indexCounters" : {
    		"accesses" : 659025658,
    		"hits" : 659025737,
    		"misses" : 0,
    		"resets" : 0,
    		"missRatio" : 0
    	},
    	"locks" : {
    		"." : {
    			"timeLockedMicros" : {
    				"R" : NumberLong(314621512),
    				"W" : NumberLong("2481468915")
    			},
    			"timeAcquiringMicros" : {
    				"R" : NumberLong(165583326),
    				"W" : NumberLong(55517767)
    			}
    		},
    		"admin" : {
    			"timeLockedMicros" : {
    				"r" : NumberLong(334274),
    				"w" : NumberLong(0)
    			},
    			"timeAcquiringMicros" : {
    				"r" : NumberLong(30258),
    				"w" : NumberLong(0)
    			}
    		},
    		"local" : {
    			"timeLockedMicros" : {
    				"r" : NumberLong("47989809726"),
    				"w" : NumberLong("3882256572")
    			},
    			"timeAcquiringMicros" : {
    				"r" : NumberLong("17148189643"),
    				"w" : NumberLong("2537933986")
    			}
    		},
    		"newsblur" : {
    			"timeLockedMicros" : {
    				"r" : NumberLong("36244686785"),
    				"w" : NumberLong("15623257735")
    			},
    			"timeAcquiringMicros" : {
    				"r" : NumberLong("22994011410"),
    				"w" : NumberLong("11194633620")
    			}
    		},
    		"nbanalytics" : {
    			"timeLockedMicros" : {
    				"r" : NumberLong("27974077100"),
    				"w" : NumberLong("6537813922")
    			},
    			"timeAcquiringMicros" : {
    				"r" : NumberLong("14723535968"),
    				"w" : NumberLong("2881253851")
    			}
    		},
    		"test" : {
    			"timeLockedMicros" : {
    				"r" : NumberLong(127533),
    				"w" : NumberLong(0)
    			},
    			"timeAcquiringMicros" : {
    				"r" : NumberLong(29355),
    				"w" : NumberLong(0)
    			}
    		}
    	},
    	"network" : {
    		"bytesIn" : 58477078629,
    		"bytesOut" : 942994088833,
    		"numRequests" : 287422078
    	},
    	"opcounters" : {
    		"insert" : 23781239,
    		"query" : 57122136,
    		"update" : 6132326,
    		"delete" : 2314137,
    		"getmore" : 137267880,
    		"command" : 56154997
    	},
    	"opcountersRepl" : {
    		"insert" : 0,
    		"query" : 0,
    		"update" : 0,
    		"delete" : 0,
    		"getmore" : 0,
    		"command" : 0
    	},
    	"recordStats" : {
    		"accessesNotInMemory" : 260646,
    		"pageFaultExceptionsThrown" : 177308,
    		"admin" : {
    			"accessesNotInMemory" : 0,
    			"pageFaultExceptionsThrown" : 0
    		},
    		"local" : {
    			"accessesNotInMemory" : 1,
    			"pageFaultExceptionsThrown" : 0
    		},
    		"nbanalytics" : {
    			"accessesNotInMemory" : 793,
    			"pageFaultExceptionsThrown" : 8
    		},
    		"newsblur" : {
    			"accessesNotInMemory" : 259852,
    			"pageFaultExceptionsThrown" : 177300
    		},
    		"test" : {
    			"accessesNotInMemory" : 0,
    			"pageFaultExceptionsThrown" : 0
    		}
    	},
    	"repl" : {
    		"setName" : "nbset",
    		"ismaster" : true,
    		"secondary" : false,
    		"hosts" : [
    			"db24:27017",
    			"db25:27017",
    			"db22:27017",
    			"db21:27017",
    			"db20:27017"
    		],
    		"primary" : "db24:27017",
    		"me" : "db24:27017"
    	},
    	"writeBacksQueued" : false,
    	"mem" : {
    		"bits" : 64,
    		"resident" : 56067,
    		"virtual" : 339567,
    		"supported" : true,
    		"mapped" : 155638,
    		"mappedWithJournal" : 311276
    	},
    	"metrics" : {
    		"document" : {
    			"deleted" : NumberLong(6891535),
    			"inserted" : NumberLong(23781249),
    			"returned" : NumberLong(818214765),
    			"updated" : NumberLong(6070767)
    		},
    		"getLastError" : {
    			"wtime" : {
    				"num" : 0,
    				"totalMillis" : 0
    			},
    			"wtimeouts" : NumberLong(0)
    		},
    		"operation" : {
    			"fastmod" : NumberLong(2767179),
    			"idhack" : NumberLong(4609252),
    			"scanAndOrder" : NumberLong(4887373)
    		},
    		"queryExecutor" : {
    			"scanned" : NumberLong(535766364)
    		},
    		"record" : {
    			"moves" : NumberLong(793024)
    		},
    		"repl" : {
    			"apply" : {
    				"batches" : {
    					"num" : 0,
    					"totalMillis" : 0
    				},
    				"ops" : NumberLong(0)
    			},
    			"buffer" : {
    				"count" : NumberLong(0),
    				"maxSizeBytes" : 268435456,
    				"sizeBytes" : NumberLong(0)
    			},
    			"network" : {
    				"bytes" : NumberLong(0),
    				"getmores" : {
    					"num" : 0,
    					"totalMillis" : 0
    				},
    				"ops" : NumberLong(0),
    				"readersCreated" : NumberLong(163)
    			},
    			"oplog" : {
    				"insert" : {
    					"num" : 35850862,
    					"totalMillis" : 1544651
    				},
    				"insertBytes" : NumberLong("20937520538")
    			},
    			"preload" : {
    				"docs" : {
    					"num" : 0,
    					"totalMillis" : 0
    				},
    				"indexes" : {
    					"num" : 0,
    					"totalMillis" : 0
    				}
    			}
    		},
    		"ttl" : {
    			"deletedDocuments" : NumberLong(0),
    			"passes" : NumberLong(4135)
    		}
    	},
    	"ok" : 1
    }
    nbset:PRIMARY> 

Comment by Daniel Pasette (Inactive) [ 28/Mar/13 ]

I can't reproduce this. Any further steps or information about the exact platform?

Generated at Thu Feb 08 03:19:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.