Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25075

Building 2dsphere index uses excessive memory

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Sprint:
      Repl 18 (08/05/16)

      Description

      Create a collection with 24 million documents {loc: [0, 0]}. Then create an index with db.c.createIndex({loc:"2dsphere"}).

      • index create runs from A to B
      • 1825 MB allocated outside the cache ("allocated minus wt cache")
      • heap profile shows the following stack accounts for most of the allocations:

        heapProfile stack39: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::S2CellIdToIndexKey", 3: "0xc79010", 4: "mongo::ExpressionKeysPrivate::getS2Keys", 5: "mongo::IndexAccessMethod::BulkBuilder::insert", 6: "mongo::MultiIndexBlock::insert", 7: "mongo::MultiIndexBlock::insertAllDocumentsInCollection", 8: "mongo::CmdCreateIndex::run", 9: "mongo::Command::run", 10: "mongo::Command::execCommand", 11: "mongo::runCommands", 12: "mongo::assembleResponse", 13: "mongo::MyMessageHandler::process", 14: "mongo::PortMessageServer::handleIncomingMsg", 15: "0x7fd30dc996aa", 16: "clone" }
        

        These are keys being accumulated for sorting. The accumulated size of the keys is growing to 1348 MiB before external sort is used whereas it should be limited to 100 MiB.

        Activity

        Hide
        bruce.lucas Bruce Lucas added a comment -

        The pattern of bytes used indicate that the sorter was filled and then spilled to disk 9 times while processing the 24 M keys, so it took 24 M / 9 keys to fill it. Memory used by the sorter was about 1300 MiB, or about 1300 MiB / (24 M / 9) = 511 bytes per key, which is much larger than the actual key size, but is about the default initial size of a BSONObjBuilder. So it appears that the problem is that S2CellIdToIndexKey constructs a document using a BSONObjBuilder with the default initial buffer size of 512 bytes and then uses that buffer as-is by calling BSONObjBuilder::obj, whereas the sorter only accounts for the BSONObj size, not the actual buffer size, when computing the amount of memory used.

        Verified that either of the following fixes the problem:

        diff --git a/src/mongo/db/index/s2_common.cpp b/src/mongo/db/index/s2_common.cpp
        index e1db08e..dc2534a 100644
        --- a/src/mongo/db/index/s2_common.cpp
        +++ b/src/mongo/db/index/s2_common.cpp
        @@ -91,6 +91,6 @@ BSONObj S2CellIdToIndexKey(const S2CellId& cellId, S2IndexVersion indexVersion)
             } else {
                 b.append("", cellId.ToString());
             }
        -    return b.obj();
        +    return b.obj().copy();
         }
         }  // namespace mongo
        

        diff --git a/src/mongo/db/index/s2_common.cpp b/src/mongo/db/index/s2_common.cpp
        index e1db08e..0977018 100644
        --- a/src/mongo/db/index/s2_common.cpp
        +++ b/src/mongo/db/index/s2_common.cpp
        @@ -85,7 +85,7 @@ BSONObj S2CellIdToIndexKey(const S2CellId& cellId, S2IndexVersion indexVersion)
             // more than once face, individual intervals will
             // never cross that threshold. Thus, scans will still
             // produce the same results.
        -    BSONObjBuilder b;
        +    BSONObjBuilder b(20);
             if (indexVersion >= S2_INDEX_VERSION_3) {
                 b.append("", static_cast<long long>(cellId.id()));
             } else {
        

        TBD whether there are other indexing code paths with the same issue.

        Show
        bruce.lucas Bruce Lucas added a comment - The pattern of bytes used indicate that the sorter was filled and then spilled to disk 9 times while processing the 24 M keys, so it took 24 M / 9 keys to fill it. Memory used by the sorter was about 1300 MiB, or about 1300 MiB / (24 M / 9) = 511 bytes per key, which is much larger than the actual key size, but is about the default initial size of a BSONObjBuilder. So it appears that the problem is that S2CellIdToIndexKey constructs a document using a BSONObjBuilder with the default initial buffer size of 512 bytes and then uses that buffer as-is by calling BSONObjBuilder::obj, whereas the sorter only accounts for the BSONObj size, not the actual buffer size, when computing the amount of memory used. Verified that either of the following fixes the problem: diff --git a/src/mongo/db/index/s2_common.cpp b/src/mongo/db/index/s2_common.cpp index e1db08e..dc2534a 100644 --- a/src/mongo/db/index/s2_common.cpp +++ b/src/mongo/db/index/s2_common.cpp @@ -91,6 +91,6 @@ BSONObj S2CellIdToIndexKey(const S2CellId& cellId, S2IndexVersion indexVersion) } else { b.append("", cellId.ToString()); } - return b.obj(); + return b.obj().copy(); } } // namespace mongo diff --git a/src/mongo/db/index/s2_common.cpp b/src/mongo/db/index/s2_common.cpp index e1db08e..0977018 100644 --- a/src/mongo/db/index/s2_common.cpp +++ b/src/mongo/db/index/s2_common.cpp @@ -85,7 +85,7 @@ BSONObj S2CellIdToIndexKey(const S2CellId& cellId, S2IndexVersion indexVersion) // more than once face, individual intervals will // never cross that threshold. Thus, scans will still // produce the same results. - BSONObjBuilder b; + BSONObjBuilder b(20); if (indexVersion >= S2_INDEX_VERSION_3) { b.append("", static_cast<long long>(cellId.id())); } else { TBD whether there are other indexing code paths with the same issue.
        Hide
        pasette Dan Pasette added a comment -

        This issue is actually more general than stated from the looks of it even without the bug pointed out here by Bruce.

        mongod could require up to 64*100MB of scratch space for index builds during an initial sync (by design). Maybe we should be trimming the size of these buffers based on available resources.

        Show
        pasette Dan Pasette added a comment - This issue is actually more general than stated from the looks of it even without the bug pointed out here by Bruce. mongod could require up to 64*100MB of scratch space for index builds during an initial sync (by design). Maybe we should be trimming the size of these buffers based on available resources.
        Hide
        siyuan.zhou Siyuan Zhou added a comment -

        Thanks, Bruce Lucas for the detailed explanation. I can confirm this issue by printing out the object size and its buffer's size. For a given index document in S2 index version 3, its object size is 15, but the default buffer size is 512 (data) + 4 (Holder's ref count).

        The layout of an index object of S2 index version 3:
        total size (4 bytes)  |  type code 0x12 (1)  |  field name "" 0x00 (1)  |  long long cell id (8) | EOO (1)
        Object size: 4 + 1 + 1 +  8 + 1 = 15 bytes
         
        The layout of an index object of S2 index version 1 and 2:
        total size (4 bytes)  |  type code 0x12 (1)  |  field name "" 0x00 (1)  |  cell id string (2 ~ 32) 0x00 (1) | EOO (1)
        Object size: 4 + 1 + 1 +  (2 ~ 32) + 1 + 1 = 10 ~ 40 bytes
        

        Reserving only 15 bytes should fix the problem for S2 index version 3. For earlier versions, a simple copy() should work. Alternatively, we can reserve 40 bytes.

        If this turns out to be a bigger problem for other types of indexes, I'd suggest tracking the size of a SharedBuffer and adding it to BSONObj::memUsageForSorter(). The offset of 4 bytes of the ref count in SharedBuffer can also be addressed separately if we want.

        Show
        siyuan.zhou Siyuan Zhou added a comment - Thanks, Bruce Lucas for the detailed explanation. I can confirm this issue by printing out the object size and its buffer's size. For a given index document in S2 index version 3, its object size is 15, but the default buffer size is 512 (data) + 4 (Holder's ref count). The layout of an index object of S2 index version 3: total size (4 bytes) | type code 0x12 (1) | field name "" 0x00 (1) | long long cell id (8) | EOO (1) Object size: 4 + 1 + 1 + 8 + 1 = 15 bytes   The layout of an index object of S2 index version 1 and 2: total size (4 bytes) | type code 0x12 (1) | field name "" 0x00 (1) | cell id string (2 ~ 32) 0x00 (1) | EOO (1) Object size: 4 + 1 + 1 + (2 ~ 32) + 1 + 1 = 10 ~ 40 bytes Reserving only 15 bytes should fix the problem for S2 index version 3. For earlier versions, a simple copy() should work. Alternatively, we can reserve 40 bytes. If this turns out to be a bigger problem for other types of indexes, I'd suggest tracking the size of a SharedBuffer and adding it to BSONObj::memUsageForSorter(). The offset of 4 bytes of the ref count in SharedBuffer can also be addressed separately if we want.
        Hide
        bruce.lucas Bruce Lucas added a comment -

        I opened SERVER-25318 to track the more general issue.

        Show
        bruce.lucas Bruce Lucas added a comment - I opened SERVER-25318 to track the more general issue.
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

        Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index.
        Branch: master
        https://github.com/mongodb/mongo/commit/2743e906fef318763e753a67967d503b37fcdd07

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'} Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index. Branch: master https://github.com/mongodb/mongo/commit/2743e906fef318763e753a67967d503b37fcdd07
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

        Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index.

        (cherry picked from commit 2743e906fef318763e753a67967d503b37fcdd07)
        Branch: v3.2
        https://github.com/mongodb/mongo/commit/90b3966985405c450266be9b19f004f7f3a2b159

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'} Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index. (cherry picked from commit 2743e906fef318763e753a67967d503b37fcdd07) Branch: v3.2 https://github.com/mongodb/mongo/commit/90b3966985405c450266be9b19f004f7f3a2b159
        Hide
        xgen-internal-githook Githook User added a comment -

        Author:

        {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

        Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index.
        Branch: v3.0
        https://github.com/mongodb/mongo/commit/e0591f59efcaf478ff47a1580d0dae9e875772f0

        Show
        xgen-internal-githook Githook User added a comment - Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'} Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index. Branch: v3.0 https://github.com/mongodb/mongo/commit/e0591f59efcaf478ff47a1580d0dae9e875772f0

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

                Agile