Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8442

Map-reduce memory leak

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.2.2, 2.2.3, 2.3.2
    • Fix Version/s: 2.2.4, 2.4.0-rc1
    • Component/s: MapReduce, Sharding
    • Labels:
      None
    • Environment:
      Reproduced on Ubuntu 12.04.1 LTS and OS X
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      To reproduce:
      1) Setup a 2 shard cluster with a mongos and config server.
      2) mongorestore attached shard1, shard2 and config dumps.
      3) Run attached map-reduce-loop.js script against the clusters mongos.
      You will see memory grow on shard1 (looks like no data is found on shard2).

      Show
      To reproduce: 1) Setup a 2 shard cluster with a mongos and config server. 2) mongorestore attached shard1, shard2 and config dumps. 3) Run attached map-reduce-loop.js script against the clusters mongos. You will see memory grow on shard1 (looks like no data is found on shard2).

      Description

      Non-mapped virtual memory grows on a mongod primary while map-reduce jobs are run. This leads to bouncing the primary once a day to prevent OOM failures. This happens on a single shard of a sharded cluster.

      Also attached is valgrind massif output from both shards. From the shard1 file you will see memory allocated in the following grow throughout the run:

      ->12.98% (10,360,855B) 0x96F7C9: mongo::ourmalloc(unsigned long) (allocator.h:28)
      | ->08.59% (6,855,680B) 0xD6C6B1: mongo::MessagingPort::recv(mongo::Message&) (message_port.cpp:188)
      | | ->08.59% (6,854,656B) 0x9AF673: mongo::DBClientConnection::recv(mongo::Message&) (dbclient.cpp:1145)
      | | | ->08.59% (6,854,656B) 0x9D2D29: mongo::DBClientCursor::initLazyFinish(bool&) (dbclientcursor.cpp:89)
      | | |   ->08.59% (6,854,656B) 0x9FC29E: mongo::ParallelSortClusteredCursor::_oldInit() (parallel.cpp:1425)
      | | |     ->08.59% (6,854,656B) 0x9FB507: mongo::ParallelSortClusteredCursor::_init() (parallel.cpp:1282)
      | | |       ->08.59% (6,854,656B) 0x9F0DD3: mongo::ClusteredCursor::init() (parallel.cpp:83)
      | | |         ->08.59% (6,854,656B) 0xA7CADB: mongo::mr::MapReduceFinishCommand::run(std::string const&, mongo::BSONObj&, int, std::string&, mongo::BSONObjBuilder&, bool) (mr.cpp:1311)
      | | |           ->08.59% (6,854,656B) 0xA98918: mongo::_execCommand(mongo::Command*, std::string const&, mongo::BSONObj&, int, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1859)
      | | |             ->08.59% (6,854,656B) 0xA9968A: mongo::execCommand(mongo::Command*, mongo::Client&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, bool) (dbcommands.cpp:1985)
      | | |               ->08.59% (6,854,656B) 0xA9A0E3: mongo::_runCommands(char const*, mongo::BSONObj&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (dbcommands.cpp:2069)
      | | |                 ->08.59% (6,854,656B) 0xB8F3CC: mongo::runCommands(char const*, mongo::BSONObj&, mongo::CurOp&, mongo::_BufBuilder<mongo::TrivialAllocator>&, mongo::BSONObjBuilder&, bool, int) (query.cpp:43)
      | | |                   ->08.59% (6,854,656B) 0xB93DAD: mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&) (query.cpp:920)
      | | |                     ->08.59% (6,854,656B) 0xB1E974: mongo::receivedQuery(mongo::Client&, mongo::DbResponse&, mongo::Message&) (instance.cpp:244)
      | | |                       ->08.59% (6,854,656B) 0xB1F922: mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) (instance.cpp:390)
      | | |                         ->08.59% (6,854,656B) 0x9757F1: mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) (db.cpp:192)
      | | |                           ->08.59% (6,854,656B) 0xD6F2A8: mongo::pms::threadRun(mongo::MessagingPort*) (message_server_port.cpp:85)
      | | |                             ->08.59% (6,854,656B) 0x4E35E98: start_thread (pthread_create.c:308)
      | | |                               ->08.59% (6,854,656B) 0x5950CBB: clone (clone.S:112)

        Attachments

        1. 31190.txt
          122 kB
        2. 31307.txt
          15 kB

          Issue Links

            Activity

              People

              Assignee:
              greg_10gen Greg Studer
              Reporter:
              james.wahlin James Wahlin
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: