Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31584

limit initial sync lookahead buffer by size during oplog application phase

    XMLWordPrintable

    Details

      Description

      Initial sync uses a lookahead buffer, introduced in SERVER-26191) to read from the temporary oplog buffer collection during the oplog application phase. The size of this buffer is currently specified in terms of the number of documents it can contain. The current default of 10,000, which suggests that this cache can hold up to 160 GB of data, may result in OOM errors.

      We should either reduce the default to a more reasonable number, or constrain the size of the buffer by the total size of all buffered documents.

      ------------------------- OLD DESCRIPTION BELOW -------------------

      In the following example the oplog application phase used ~6 GB of memory. This can lead to OOM during initial sync.

      In the above example "A" corresponds to the following:

      2017-10-13T03:44:10.775-0500 I REPL     [replication-140] Applying operations until { : Timestamp 1507884249000|1 } before initial sync can complete. (starting at { : Timestamp 1507832798000|1 })
      

      The memory was allocated by this stack:

      2017-10-13T03:44:11.064-0500 I -        [ftdc] heapProfile stack1651: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::BSONObj::copy", 3: "mongo::BSONObj::getOwned", 4: "0x7f734da6c793", 5: "mongo::repl::StorageInterfaceImpl::findDocuments", 6: "mongo::repl::OplogBufferCollection::_peek_inlock", 7: "mongo::repl::OplogBufferCollection::peek", 8: "mongo::repl::OplogBufferProxy::peek", 9: "mongo::repl::InitialSyncer::_getNextApplierBatch_inlock", 10: "mongo::repl::InitialSyncer::_getNextApplierBatchCallback", 11: "std::_Function_handler<void ", 12: "mongo::executor::ThreadPoolTaskExecutor::runCallback", 13: "0x7f734dd2866b", 14: "mongo::ThreadPool::_doOneTask", 15: "mongo::ThreadPool::_consumeTasks", 16: "mongo::ThreadPool::_workerThreadBody", 17: "0x7f734ea0a0f0", 18: "0x7f7349767dc5", 19: "clone" }
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-repl Backlog - Replication Team
              Reporter:
              bruce.lucas Bruce Lucas
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

                Dates

                Created:
                Updated: