Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9861

MapReduceFinishCommand Memory Leak

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.2.4, 2.4.4
    • Fix Version/s: 2.4.7, 2.5.2
    • Component/s: MapReduce
    • Labels:
      None
    • Environment:
      MongoDB 2.2.4, Ubuntu 12.0.4
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      To reproduce:

      1. Setup a 2 shard MongoDB 2.2.4 cluster with a mongos and config server.
      2. mongorestore attached shard1, shard2 and config dumps.
      3. Run attached map-reduce-loop.js script against the clusters mongos.
        You will see memory grow on shard1 (looks like no data is found on shard2).

      Files to use for the restore are:
      Map Reduce Script
      Mongodump - Shard 1
      Mongodump - Shard 2
      Mongodump - Config

      Show
      To reproduce: Setup a 2 shard MongoDB 2.2.4 cluster with a mongos and config server. mongorestore attached shard1, shard2 and config dumps. Run attached map-reduce-loop.js script against the clusters mongos. You will see memory grow on shard1 (looks like no data is found on shard2). Files to use for the restore are: Map Reduce Script Mongodump - Shard 1 Mongodump - Shard 2 Mongodump - Config

      Description

      Issue Status as of October 23rd, 2013

      ISSUE SUMMARY
      When mapReduce is run repeatedly on the same client connection the mongod will continue to keep track of any temporary collections used during the mapReduce. These temporary collection names will slowly build up in a cache on the mongod, appearing as a slow memory leak.

      USER IMPACT
      This impacts users of mapReduce on sharded collections and manifests as a slow increase in non-mapped virtual memory on mongod. It is present in versions of MongoDB prior to and including v2.4.6.

      SOLUTION
      After each mapReduce completes and the temporary collections are dropped, also explicitly remove the collection name from the cache used for keeping track of namespace versioning.

      WORKAROUNDS
      This issue can be worked around by stepping down the primary mongod of the shard.

      PATCHES
      Production release v2.4.7 contains the fix for this issue, and production release v2.6.0 will contain the fix as well.

      Original Description

      When running map-reduce continually with the test set provided we see 2 paths in which there is continual heap growth. The following is valgrind massif output for both. Note that in my test, case 1 grew from 2 to 6 MB and case 2 grew from 1 to 3 MB, over the course of a few hours.
      Case 1:

      ->08.50% (6,239,857B) 0x50E9A87: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
      | ->08.45% (6,208,357B) 0x50EA7F9: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
      | | ->08.45% (6,208,323B) 0x50EA8DE: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
      | | | ->08.44% (6,200,449B) 0x50EAE0B: std::string::append(std::string const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16)
      | | | | ->08.44% (6,199,973B) 0x66A933: mongo::mr::MapReduceFinishCommand::run(std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (basic_string.h:2310)
      | | | | | ->08.44% (6,199,973B) 0x67DBFD: mongo::_execCommand(mongo::Command*, std::string const&, mongo::BSONObj&, int, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1859)
      | | | | |   ->08.44% (6,199,973B) 0x67E684: mongo::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1985)
      | | | | |     ->08.44% (6,199,973B) 0x67F58B: mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:2069)
      | | | | |       ->08.44% (6,199,973B) 0x74CFC8: mongo::runCommands(char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (query.cpp:43)
      | | | | |         ->08.44% (6,199,973B) 0x750AAD: mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (query.cpp:920)
      | | | | |           ->08.44% (6,199,973B) 0x6FB166: mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:244)
      | | | | |             ->08.44% (6,199,973B) 0x592777: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:193)
      | | | | |               ->08.44% (6,199,973B) 0x90DDC9: mongo::pms::threadRun(mongo::MessagingPort*) (message_server_port.cpp:85)
      | | | | |                 ->08.44% (6,199,973B) 0x4E35E98: start_thread (pthread_create.c:308)
      | | | | |                   ->08.44% (6,199,973B) 0x5950CCB: clone (clone.S:112)

      Second (looks like stats logging for temporary MR namespaces?):

      ->04.73% (3,470,360B) 0x5A8F4E: std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_insert_(std::_Rb_tree_node_base const*, std::_Rb_tree_node_base const*, std::string const&) (new_allocator.h:92)
      | ->04.73% (3,470,120B) 0x5BAA50: std::_Rb_tree<std::string, std::string, std::_Identity<std::string>, std::less<std::string>, std::allocator<std::string> >::_M_insert_unique(std::string const&) (stl_tree.h:1291)
      | | ->04.73% (3,470,080B) 0x8B5BA7: mongo::ClientConnections::_check(std::string const&) (stl_set.h:410)
      | | | ->04.73% (3,470,080B) 0x8B5C28: mongo::ClientConnections::get(std::string const&, std::string const&) (shardconnection.cpp:149)
      | | |   ->04.73% (3,470,080B) 0x8B4142: mongo::ShardConnection::_init() (shardconnection.cpp:323)
      | | |     ->04.73% (3,470,080B) 0x8B420C: mongo::ShardConnection::ShardConnection(std::string const&, std::string const&, boost::shared_ptr<mongo::ChunkManager const>) (shardconnection.cpp:316)
      | | |       ->04.73% (3,470,080B) 0x5F8E86: mongo::ParallelSortClusteredCursor::_oldInit() (parallel.cpp:1377)
      | | |         ->04.73% (3,470,080B) 0x669753: mongo::mr::MapReduceFinishCommand::run(std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (mr.cpp:1308)
      | | |           ->04.73% (3,470,080B) 0x67DBFD: mongo::_execCommand(mongo::Command*, std::string const&, mongo::BSONObj&, int, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1859)
      | | |             ->04.73% (3,470,080B) 0x67E684: mongo::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1985)
      | | |               ->04.73% (3,470,080B) 0x67F58B: mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:2069)
      | | |                 ->04.73% (3,470,080B) 0x74CFC8: mongo::runCommands(char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (query.cpp:43)
      | | |                   ->04.73% (3,470,080B) 0x750AAD: mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (query.cpp:920)
      | | |                     ->04.73% (3,470,080B) 0x6FB166: mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:244)
      | | |                       ->04.73% (3,470,080B) 0x592777: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:193)
      | | |                         ->04.73% (3,470,080B) 0x90DDC9: mongo::pms::threadRun(mongo::MessagingPort*) (message_server_port.cpp:85)
      | | |                           ->04.73% (3,470,080B) 0x4E35E98: start_thread (pthread_create.c:308)
      | | |                             ->04.73% (3,470,080B) 0x5950CCB: clone (clone.S:112)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              greg_10gen Greg Studer
              Reporter:
              james.wahlin James Wahlin
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: