[SERVER-13683] all threads must call ProfilerRegisterThread() for gperftools Created: 22/Apr/14  Updated: 10/Dec/14  Resolved: 24/Jul/14

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 2.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark Callaghan Assignee: Andrew Morrow (Inactive)
Resolution: Cannot Reproduce Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

see https://code.google.com/p/gperftools/issues/detail?id=335 where gperftool maintainers admit this is probably needed and the thread at https://groups.google.com/forum/#!topic/google-perftools/gEpE7_AdibI. This is a regression in gperftools as it wasn't needed in the near past. But many, many years ago it was needed when I was profiling mysqld. I added the call to ProfilerRegisterThread to to handleIncomingMsg.

In this case I compiled the server with --user-cpu-profiler, called _cpuProfilerStart, ran a workload, called _cpuProfilerStop

Participants:

 Description   

To get CPU profiling from gperftools on recent Linux systems/kernels, all threads should call ProfilerRegisterThread. This issue is forked from https://jira.mongodb.org/browse/SERVER-6628

Without adding the calls to ProfilerRegisterThread the gperftools profiler output file has nothing useful...
pprof --pdf /path/to/mongod /path/to/profiler/output
Using local file mongodb-src-r2.6.0/build/linux2/norm/mongo/mongod.
Using local file prof2.260.
No nodes to print



 Comments   
Comment by Ramon Fernandez Marina [ 14/Jul/14 ]

mdcallag, we have not been able to reproduce the behavior you describe on this ticket, so we're going to mark it as resolved. I'll post the detailed steps we tried in a separate comment, in case you're interested in following up.


Regards,
Ramón.

Comment by Andrew Morrow (Inactive) [ 02/May/14 ]

Hi Mark -

I just spun up a clean Ubuntu 13.10 image on Digital Ocean, and I was able to use the cpu profiler:

# lsb_release -c
Codename:	saucy

I set up the packages necessary to build. You need to install google-perftools to get a copy of google-pprof since we don't build it. Then cloned mongo, checked out 2.6.0, built it using your options, and started mongod.

# sudo apt-get install scons git build-essential libboost-all-dev g++-multilib libunwind8-dev google-perftools
# git clone https://github.com/mongodb/mongo.git
# git checkout r2.6.0
# scons --variant-dir=linux2/norm --nostrip --full --prefix=/root/opt --allocator=tcmalloc  --opt=on --mongod-concurrency-level=db  --use-cpu-profiler -j20 install
# mkdir -d /data/db
# /root/opt/bin/mongod 

Then in the mongo shell in another session:

# /root/opt/bin/mongo
> use admin
> db.runCommand({_cpuProfilerStart: {profileFilename: './foo.prof'}});

Then run some update load against mongod, then, in the mongo shell

# /root/opt/bin/mongo
> db.runCommand({_cpuProfilerStop: {}});

I now have a foo.prof file with contents:

# ls -la ./foo.prof
-rw-r--r-- 1 root root 352217 May  2 14:54 ./foo.prof

Running google-pprof (from the ubuntu google-perftools package) against this file gets me useful output:

# google-pprof --text /root/opt/bin/mongod ./foo.prof | grep mongo::Update
Using local file /root/opt/bin/mongod.
Using local file ./foo.prof.
Removing _L_unlock_13 from all stack traces.
      18   0.2%  61.9%      600   8.2% mongo::UpdateDriver::update
       7   0.1%  82.8%       97   1.3% mongo::UpdateDriver::~UpdateDriver
       6   0.1%  86.2%      180   2.5% mongo::UpdateDriver::parse
       6   0.1%  86.3%     1881  25.8% mongo::UpdateExecutor::execute
       6   0.1%  86.4%      215   3.0% mongo::UpdateExecutor::prepare
       6   0.1%  86.5%        6   0.1% mongo::UpdateLifecycleImpl::setCollection
       5   0.1%  89.6%      109   1.5% mongo::UpdateDriver::addAndParse
       5   0.1%  89.7%       15   0.2% mongo::UpdateResult::UpdateResult
       4   0.1%  92.4%       73   1.0% mongo::UpdateDriver::UpdateDriver
       4   0.1%  92.5%       39   0.5% mongo::UpdateDriver::makeOplogEntryQuery
       4   0.1%  92.5%       77   1.1% mongo::UpdateExecutor::UpdateExecutor
       3   0.0%  94.8%       40   0.5% mongo::UpdateDriver::clear
       3   0.0%  94.9%       14   0.2% mongo::UpdateExecutor::parseQuery
       3   0.0%  94.9%      185   2.5% mongo::UpdateExecutor::parseUpdate
       3   0.0%  95.0%       17   0.2% mongo::UpdateLifecycleImpl::UpdateLifecycleImpl
       1   0.0%  99.4%        1   0.0% mongo::UpdateDriver::refreshIndexKeys
       1   0.0%  99.4%        1   0.0% mongo::UpdateDriver::setContext
       1   0.0%  99.4%        1   0.0% mongo::UpdateDriver::setModOptions
       1   0.0%  99.4%      106   1.5% mongo::UpdateExecutor::~UpdateExecutor
       1   0.0%  99.5%        5   0.1% mongo::UpdateLifecycleImpl::getImmutableFields
       1   0.0%  99.5%        1   0.0% mongo::UpdateRequest::setMulti (inline)
       1   0.0%  99.5%        2   0.0% mongo::UpdateRequest::setUpsert (inline)
       0   0.0% 100.0%        1   0.0% mongo::UpdateDriver::context
       0   0.0% 100.0%        1   0.0% mongo::UpdateDriver::modOptions
       0   0.0% 100.0%        2   0.0% mongo::UpdateDriver::populateDocumentWithQueryFields
       0   0.0% 100.0%        3   0.0% mongo::UpdateLifecycleImpl::getIndexKeys
       0   0.0% 100.0%        1   0.0% mongo::UpdateRequest::setUpdates (inline)
       0   0.0% 100.0%        3   0.0% mongo::UpdateResult::toString

So it looks to me like this works fine on Ubuntu 13.10.

Comment by Mark Callaghan [ 02/May/14 ]

Switched back to Ubuntu 13.10 on my VM using the build command below. The gperftools output file created by mongod is ~160kb but pprof --text prints nothing and pprof --gv prints "No nodes to print". This is the same as I experienced before. Given that MongoDB already has a few lines of code compiled when gperftools is enabled, this only requires 1 or 2 extra lines to avoid uncertainty from problems with certain versions of Linux.

scons --variant-dir=linux2/norm --nostrip --full --prefix=/home/mark/i/m260 --allocator=tcmalloc  --opt=on --mongod-concurrency-level=db  --use-cpu-profiler install

Comment by Andrew Morrow (Inactive) [ 02/May/14 ]

Thanks Mark -

Your scons invocation looks reasonable to me, though I wonder if you actually need --extralib=unwind, since we should be adding it for you when building with --use-cpu-profiler: https://github.com/mongodb/mongo/blob/v2.6/src/third_party/gperftools-2.0/SConscript#L82-L84.

In an effort to rule out one potential source of variation between your system and ours, could you perhaps try building against the vanilla system compiler and runtime libraries, rather than the ones from your production compatible toolchain?

Thanks,
Andrew

Comment by Mark Callaghan [ 02/May/14 ]

I use the following and don't show all of the paths (BASE_CXX, BASE_CC, UWPATH) but trust me that it makes builds complicated because we have compiler/library toolchains so that builds done on my devel server will also work in production.

scons --variant-dir=linux2/norm --nostrip --full --cxx=${BASE_CXX} --cc=${BASE_CC} --prefix=/data/mysql/$1 --allocator=tcmalloc  --opt=on --mongod-concurrency-level=db -j 10 --extrapath=${UWPATH} --extralib=unwind --use-cpu-profiler mongod

From the build output libtcmalloc gets built from provided code:
ranlib build/linux2/norm/third_party/gperftools-2.0/libtcmalloc_minimal.a

From "nm mongod | grep -i tcmalloc" I see tcmalloc but I do not see it linked with libtcmalloc in the build output

From the mongod command line I see this, and there is more but command line output is huge (all object files, few libraries). So I think I have included the right version of gperftools into mongod
build/linux2/norm/third_party/gperftools-2.0/src/base/atom
icops-internals-x86.o build/linux2/norm/third_party/gperftools-2.0/src/base/dynamic_annotations.o build/linux2/norm/third_party/gperftools-2.0/src/base/l
ogging.o build/linux2/norm/third_party/gperftools-2.0/src/base/spinlock.o build/linux2/norm/third_party/gperftools-2.0/src/base/spinlock_internal.o
...

Comment by Andrew Morrow (Inactive) [ 29/Apr/14 ]

Hi Mark -

A few of us have recently tested this on our dev machines, and in each case (after making some local edit to fix or work around SERVER-6628; we are reviewing a fix for that issue separately) we get profiles that contain data from all threads, not just the main thread, without needing to call ProfilerRegisterThread() from each thread. I know that we have tested on at least Ubuntu 12.04 and 14.04.

I see that you confirm above that you are using --allocator=tcmalloc, however that only enables using tcmalloc: it does not select which version (vendored vs system) of the tcmalloc allocator you are using. That choice is controlled by the --use-system-tcmalloc and the --use-system-all flag. Can you confirm that you are also not using --use-system-tcmalloc or --use-system-all flags, or doing anything with the include or library search path which would cause the build system to select something other than the vendored gperftools?

As an aside, another approach to profiling would be to use callgrind, though of course that comes with significantly higher overhead than the stack sampling profiler.

Please let me know how else we can help.

Thanks,
Andrew

Comment by Mark Callaghan [ 22/Apr/14 ]

I use the version shipped with MongoDB (--allocator=tcmalloc) and confirmed by looking at scons build output.

Comment by Andy Schwerin [ 22/Apr/14 ]

Are you using the system version of tcmalloc or the version shipped in the mongodb source? This comes at an opportune time, as we're investigating upgrading our shipped tcmalloc.

Generated at Thu Feb 08 03:32:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.