[SERVER-25075] Building 2dsphere index uses excessive memory Created: 14/Jul/16  Updated: 14/Dec/17  Resolved: 02/Aug/16

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: 3.0.13, 3.2.9, 3.3.11

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Siyuan Zhou
Resolution: Done Votes: 0
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 2dsphere.png    
Issue Links:
Duplicate
Related
related to SERVER-32345 Check if we need to initial sync befo... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl 18 (08/05/16)
Participants:

 Description   

Create a collection with 24 million documents {loc: [0, 0]}. Then create an index with db.c.createIndex({loc:"2dsphere"}).

  • index create runs from A to B
  • 1825 MB allocated outside the cache ("allocated minus wt cache")
  • heap profile shows the following stack accounts for most of the allocations:

    heapProfile stack39: { 0: "tc_malloc", 1: "mongo::mongoMalloc", 2: "mongo::S2CellIdToIndexKey", 3: "0xc79010", 4: "mongo::ExpressionKeysPrivate::getS2Keys", 5: "mongo::IndexAccessMethod::BulkBuilder::insert", 6: "mongo::MultiIndexBlock::insert", 7: "mongo::MultiIndexBlock::insertAllDocumentsInCollection", 8: "mongo::CmdCreateIndex::run", 9: "mongo::Command::run", 10: "mongo::Command::execCommand", 11: "mongo::runCommands", 12: "mongo::assembleResponse", 13: "mongo::MyMessageHandler::process", 14: "mongo::PortMessageServer::handleIncomingMsg", 15: "0x7fd30dc996aa", 16: "clone" }
    

    These are keys being accumulated for sorting. The accumulated size of the keys is growing to 1348 MiB before external sort is used whereas it should be limited to 100 MiB.



 Comments   
Comment by Githook User [ 08/Sep/16 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index.
Branch: v3.0
https://github.com/mongodb/mongo/commit/e0591f59efcaf478ff47a1580d0dae9e875772f0

Comment by Githook User [ 03/Aug/16 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index.

(cherry picked from commit 2743e906fef318763e753a67967d503b37fcdd07)
Branch: v3.2
https://github.com/mongodb/mongo/commit/90b3966985405c450266be9b19f004f7f3a2b159

Comment by Githook User [ 02/Aug/16 ]

Author:

{u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}

Message: SERVER-25075 Limit BSONObj buffer size used by 2dsphere index.
Branch: master
https://github.com/mongodb/mongo/commit/2743e906fef318763e753a67967d503b37fcdd07

Comment by Bruce Lucas (Inactive) [ 28/Jul/16 ]

I opened SERVER-25318 to track the more general issue.

Comment by Siyuan Zhou [ 28/Jul/16 ]

Thanks, bruce.lucas@mongodb.com for the detailed explanation. I can confirm this issue by printing out the object size and its buffer's size. For a given index document in S2 index version 3, its object size is 15, but the default buffer size is 512 (data) + 4 (Holder's ref count).

The layout of an index object of S2 index version 3:
total size (4 bytes)  |  type code 0x12 (1)  |  field name "" 0x00 (1)  |  long long cell id (8) | EOO (1)
Object size: 4 + 1 + 1 +  8 + 1 = 15 bytes
 
The layout of an index object of S2 index version 1 and 2:
total size (4 bytes)  |  type code 0x12 (1)  |  field name "" 0x00 (1)  |  cell id string (2 ~ 32) 0x00 (1) | EOO (1)
Object size: 4 + 1 + 1 +  (2 ~ 32) + 1 + 1 = 10 ~ 40 bytes

Reserving only 15 bytes should fix the problem for S2 index version 3. For earlier versions, a simple copy() should work. Alternatively, we can reserve 40 bytes.

If this turns out to be a bigger problem for other types of indexes, I'd suggest tracking the size of a SharedBuffer and adding it to BSONObj::memUsageForSorter(). The offset of 4 bytes of the ref count in SharedBuffer can also be addressed separately if we want.

Comment by Daniel Pasette (Inactive) [ 18/Jul/16 ]

This issue is actually more general than stated from the looks of it even without the bug pointed out here by Bruce.

mongod could require up to 64*100MB of scratch space for index builds during an initial sync (by design). Maybe we should be trimming the size of these buffers based on available resources.

Comment by Bruce Lucas (Inactive) [ 18/Jul/16 ]

The pattern of bytes used indicate that the sorter was filled and then spilled to disk 9 times while processing the 24 M keys, so it took 24 M / 9 keys to fill it. Memory used by the sorter was about 1300 MiB, or about 1300 MiB / (24 M / 9) = 511 bytes per key, which is much larger than the actual key size, but is about the default initial size of a BSONObjBuilder. So it appears that the problem is that S2CellIdToIndexKey constructs a document using a BSONObjBuilder with the default initial buffer size of 512 bytes and then uses that buffer as-is by calling BSONObjBuilder::obj, whereas the sorter only accounts for the BSONObj size, not the actual buffer size, when computing the amount of memory used.

Verified that either of the following fixes the problem:

diff --git a/src/mongo/db/index/s2_common.cpp b/src/mongo/db/index/s2_common.cpp
index e1db08e..dc2534a 100644
--- a/src/mongo/db/index/s2_common.cpp
+++ b/src/mongo/db/index/s2_common.cpp
@@ -91,6 +91,6 @@ BSONObj S2CellIdToIndexKey(const S2CellId& cellId, S2IndexVersion indexVersion)
     } else {
         b.append("", cellId.ToString());
     }
-    return b.obj();
+    return b.obj().copy();
 }
 }  // namespace mongo

diff --git a/src/mongo/db/index/s2_common.cpp b/src/mongo/db/index/s2_common.cpp
index e1db08e..0977018 100644
--- a/src/mongo/db/index/s2_common.cpp
+++ b/src/mongo/db/index/s2_common.cpp
@@ -85,7 +85,7 @@ BSONObj S2CellIdToIndexKey(const S2CellId& cellId, S2IndexVersion indexVersion)
     // more than once face, individual intervals will
     // never cross that threshold. Thus, scans will still
     // produce the same results.
-    BSONObjBuilder b;
+    BSONObjBuilder b(20);
     if (indexVersion >= S2_INDEX_VERSION_3) {
         b.append("", static_cast<long long>(cellId.id()));
     } else {

TBD whether there are other indexing code paths with the same issue.

Generated at Thu Feb 08 04:08:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.