[SERVER-32156] Linear field lookup in update subsystem can lead to an individual update having quadratic time complexity Created: 04/Dec/17  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Performance
Affects Version/s: 3.2.16, 3.4.6, 3.6.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Craig Leyshan Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 1
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diag.tgz     File issue-scripts.tgz    
Issue Links:
Duplicate
is duplicated by SERVER-40043 $set takes quadratic time while holdi... Closed
Related
is related to SERVER-36791 $addToSet with $each takes quadratic ... Backlog
is related to SERVER-40043 $set takes quadratic time while holdi... Closed
Assigned Teams:
Query Optimization
Operating System: ALL
Steps To Reproduce:

If required, we can provide a mongodump of the collection before it was removed, and some example queries to run against it which seem to hang the server with high CPU and cannot be killed in a timely manner. The dump is about 400MB compressed though.

Simply doing:

db.Location_Daily.find(

{"keys.floorId": "1000000409", "timestamp": 2017120200}

);

would cause it to hang with high CPU for example. Again there is a unique index on those two attributes.

Participants:
Case:

 Description   

We had some erroneous data enter our system and get written to a mongo collection. It resulted in a document containing over 600,000 attributes in it. When a find or update was attempted on this document the mongo server master went to 200% CPU and all other work ground to a halt. We attempted to kill the operation but it would not stop. It seemed that the operation was holding locks and was not yielding. Here is a sample of what the currentOp() looked like for the operation:

{
            "desc" : "conn17",
            "threadId" : "140604867651328",
            "connectionId" : 17,
            "client" : "172.31.26.193:55908",
            "active" : true,
            "opid" : 147314,
            "killPending" : true,
            "secs_running" : 532,
            "microsecs_running" : NumberLong(532401420),
            "op" : "update",
            "ns" : "skyfii-brazil-reporting.Location_Daily",
            "query" : {
                "keys.floorId" : "1000000409",
                "timestamp" : NumberLong(2017120200)
            },
            "numYields" : 0,
            "locks" : {
                "Global" : "w",
                "Database" : "w",
                "Collection" : "w"
            },
            "waitingForLock" : false,
            "lockStats" : {
                "Global" : {
                    "acquireCount" : {
                        "r" : NumberLong(1),
                        "w" : NumberLong(1)
                    }
                },
                "Database" : {
                    "acquireCount" : {
                        "w" : NumberLong(1)
                    }
                },
                "Collection" : {
                    "acquireCount" : {
                        "w" : NumberLong(1)
                    }
                }
            }
        }

Note that we have a unique index on the two attributes mentioned in the query so normally these operations take virtually no time.

In the end we had to reboot the server several times, delete the bad data, and filter the bad data before it was written. We are not so concerned that a large document could not be updated/read. Rather the real concern is that:

  • other collections also could not be updated at the same time which meant we lost some important data
  • we could not terminate the operation without a reboot or kill -9, and
  • at one stage we could not connect a client to the server as it was timing out

A bit of large (but not humongous) data in one collection should not bring down a whole database.

Here's is an idea of what one of the large documents looks like:

{
        "_id": {
                "$oid": "5a2219d4892f00b521c545a5"
        },
        "keys": {
                "floorId": "1000000409"
        },
        "timestamp": {
                "$numberLong": "2017120200"
        },
        "counters": {
                "-5037105:2680680": 1,
                "-28830090:15338265": 1,
                "1050:330": 93,
                "-15932715:8477040": 1,
                "1155:300": 2421,
                "-16772700:8923710": 1,
                "-27971520:14881500": 1,
                "-23240144:12364485": 1,
                "-15001215:7981305": 1,
                "-29534580:15713025": 1,
                "-5454135:2902620": 1,
                "-11691045:6220425": 1,
                "-1019970:543465": 1,
                "-4189770:2229870": 1,
                "-5274525:2806665": 1,
                "360:405": 1124,
                "-1271880:677610": 1,
                "-15419295:8203785": 1,
                "-12914220:6871020": 1,
.... 600,000-odd entries later ....
   }
}

Background: It's community edition mongo 3.2.16. We briefly tried on a more recent 3.4.9 locally and the same problem happens on that version too. We are running a 3 data node replica set in production managed by mongo cloud.



 Comments   
Comment by Asya Kamsky [ 22/Mar/19 ]

Flagging for triage.

Comment by Justin Seyster [ 09/Jan/18 ]

bruce.lucas Thanks for identifying the quadratic behavior in our update system. With Craig's script, I was able to track down the inefficient code path. The applyChild() in update_object_node.cpp gets called once for each element in the update document, and it performs a linear search of the input document, resulting in O(n * m) behavior for an update document of size n and input document of size m. I wrote a POC below that creates a hash table index that makes the lookup in applyChild constant time (making the whole algorithm O(m + n)). With this optimization in place, an update that previously took 3+ minutes on my test machine now runs in a fraction of a second. It's not exactly a slam dunk, though, because I expect it will be measurably slower for the common case of updates with a very small update document (1-3 elements).

We could address this problem by adapting the POC (pasted below) to only create the index when it will be useful (large n and m). We should also investigate if it is necessary for the locks that Bruce identified to be held during the document update.

diff --git a/src/mongo/db/update/update_object_node.cpp b/src/mongo/db/update/update_object_node.cpp
index ebf0b6c..a7e111c9 100644
--- a/src/mongo/db/update/update_object_node.cpp
+++ b/src/mongo/db/update/update_object_node.cpp
@@ -30,6 +30,7 @@
 
 #include "mongo/db/update/update_object_node.h"
 
+#include "mongo/base/simple_string_data_comparator.h"
 #include "mongo/db/update/field_checker.h"
 #include "mongo/db/update/modifier_table.h"
 #include "mongo/db/update/update_array_node.h"
@@ -80,6 +81,49 @@ StatusWith<std::string> parseArrayFilterIdentifier(
     return identifier.toString();
 }
 
+namespace {
+class DocIndex {
+public:
+    DocIndex(mutablebson::Element element)
+        : _isArray(element.getType() == BSONType::Array),
+          _end(element.getDocument().end()),
+          _elementsByName(SimpleStringDataComparator::kInstance
+                              .makeStringDataUnorderedMap<mutablebson::Element>()) {
+        invariant(_isArray || element.getType() == BSONType::Object);
+        if (!_isArray) {
+            _elementsByName.reserve(element.countChildren());
+        }
+        for (auto child = element.leftChild(); child.ok(); child = child.rightSibling()) {
+            _elementsByIndex.push_back(child);
+            if (!_isArray) {
+                _elementsByName.insert(std::make_pair(child.getFieldName(), child));
+            }
+        }
+    }
+
+    mutablebson::Element getChild(StringData field) const {
+        if (_isArray) {
+            auto indexFromField = parseUnsignedBase10Integer(field);
+            if (indexFromField && *indexFromField < _elementsByIndex.size()) {
+                return _elementsByIndex[*indexFromField];
+            } else {
+                return _end;
+            }
+        } else {
+            auto child = _elementsByName.find(field);
+            return (child != _elementsByName.end()) ? child->second : _end;
+        }
+    }
+
+private:
+    bool _isArray;
+    mutablebson::Element _end;
+    StringDataUnorderedMap<mutablebson::Element> _elementsByName;
+    // std::unordered_map<std::string, mutablebson::Element> _elementsByName;
+    std::vector<mutablebson::Element> _elementsByIndex;
+};
+}
+
 /**
  * Gets the child of 'element' named 'field', if it exists. Otherwise returns a non-ok element.
  */
@@ -105,6 +149,7 @@ mutablebson::Element getChild(mutablebson::Element element, StringData field) {
 void applyChild(const UpdateNode& child,
                 StringData field,
                 UpdateNode::ApplyParams* applyParams,
+                const DocIndex& docIndex,
                 UpdateNode::ApplyResult* applyResult) {
 
     auto pathTakenSizeBefore = applyParams->pathTaken->numParts();
@@ -117,7 +162,7 @@ void applyChild(const UpdateNode& child,
         // We're already traversing a path with elements that don't exist yet, so we will definitely
         // need to append.
     } else {
-        childElement = getChild(applyParams->element, field);
+        childElement = docIndex.getChild(field);
     }
 
     if (childElement.ok()) {
@@ -153,8 +198,7 @@ void applyChild(const UpdateNode& child,
     // to the end of 'pathTaken'. We should advance 'element' to the end of 'pathTaken'.
     if (applyParams->pathTaken->numParts() > pathTakenSizeBefore) {
         for (auto i = pathTakenSizeBefore; i < applyParams->pathTaken->numParts(); ++i) {
-            applyParams->element =
-                getChild(applyParams->element, applyParams->pathTaken->getPart(i));
+            applyParams->element = docIndex.getChild(applyParams->pathTaken->getPart(i));
             invariant(applyParams->element.ok());
         }
     } else if (!applyParams->pathToCreate->empty()) {
@@ -162,15 +206,14 @@ void applyChild(const UpdateNode& child,
         // If the child is a leaf node, it may have created 'pathToCreate' without moving
         // 'pathToCreate' to the end of 'pathTaken'. We should move 'pathToCreate' to the end of
         // 'pathTaken' and advance 'element' to the end of 'pathTaken'.
-        childElement = getChild(applyParams->element, applyParams->pathToCreate->getPart(0));
+        childElement = docIndex.getChild(applyParams->pathToCreate->getPart(0));
         if (childElement.ok()) {
             applyParams->element = childElement;
             applyParams->pathTaken->appendPart(applyParams->pathToCreate->getPart(0));
 
             // Either the path was fully created or not created at all.
             for (size_t i = 1; i < applyParams->pathToCreate->numParts(); ++i) {
-                applyParams->element =
-                    getChild(applyParams->element, applyParams->pathToCreate->getPart(i));
+                applyParams->element = docIndex.getChild(applyParams->pathToCreate->getPart(i));
                 invariant(applyParams->element.ok());
                 applyParams->pathTaken->appendPart(applyParams->pathToCreate->getPart(i));
             }
@@ -379,6 +422,8 @@ UpdateNode::ApplyResult UpdateObjectNode::apply(ApplyParams applyParams) const {
 
     auto applyResult = ApplyResult::noopResult();
 
+    DocIndex docIndex(applyParams.element);
+
     for (const auto& pair : _children) {
 
         // If this child has the same field name as the positional child, they must be merged and
@@ -405,7 +450,8 @@ UpdateNode::ApplyResult UpdateObjectNode::apply(ApplyParams applyParams) const {
                 mergedChild = insertResult.first;
             }
 
-            applyChild(*mergedChild->second.get(), pair.first, &applyParams, &applyResult);
+            applyChild(
+                *mergedChild->second.get(), pair.first, &applyParams, docIndex, &applyResult);
 
             applyPositional = false;
             continue;
@@ -414,18 +460,25 @@ UpdateNode::ApplyResult UpdateObjectNode::apply(ApplyParams applyParams) const {
         // If 'matchedField' is alphabetically before the current child, we should apply the
         // positional child now.
         if (applyPositional && applyParams.matchedField < pair.first) {
-            applyChild(
-                *_positionalChild.get(), applyParams.matchedField, &applyParams, &applyResult);
+            applyChild(*_positionalChild.get(),
+                       applyParams.matchedField,
+                       &applyParams,
+                       docIndex,
+                       &applyResult);
             applyPositional = false;
         }
 
         // Apply the current child.
-        applyChild(*pair.second, pair.first, &applyParams, &applyResult);
+        applyChild(*pair.second, pair.first, &applyParams, docIndex, &applyResult);
     }
 
     // 'matchedField' is alphabetically after all children, so we apply it now.
     if (applyPositional) {
-        applyChild(*_positionalChild.get(), applyParams.matchedField, &applyParams, &applyResult);
+        applyChild(*_positionalChild.get(),
+                   applyParams.matchedField,
+                   &applyParams,
+                   docIndex,
+                   &applyResult);
     }
 
     return applyResult;

Comment by Bruce Lucas (Inactive) [ 05/Dec/17 ]

Hi Craig,

Thanks for the report and the repro case. I've confirmed that large updates of large documents are O(n^2) and block db operations for the duration. Here's a simple repro:

function repro(n) {
 
    doc = {}
    for (var i = 0; i < n; i++)
        doc["f" + i] = 0
    db.c.drop()
    db.c.insert(doc)
 
    u = {}
    for (var i = 0; i < n; i++)
        u["f" + i] = 1
 
    db.c.update({}, {$inc: u})
}

Some timings:

n         3.4.10    3.6.0
 
10 k        1.8s     0.9s
20 k        3.8s     2.3s
40 k       15.8s     8.1s

Here's one stack trace sample from the update, which holds an r lock for an extended time:

Thread 3 (Thread 0x7fdc3abc3700 (LWP 31577)):
#0  0x0000559a5e09c98f in mongo::mutablebson::Element::findFirstChildNamed(mongo::StringData) const ()
#1  0x0000559a5e4d4142 in mongo::pathsupport::findLongestPrefix(mongo::FieldRef const&, mongo::mutablebson::Element, unsigned long*, mongo::mutablebson::Element*) ()
#2  0x0000559a5e4c1122 in mongo::ModifierInc::prepare(mongo::mutablebson::Element, mongo::StringData, mongo::ModifierInterface::ExecInfo*) ()
#3  0x0000559a5e4d9691 in mongo::UpdateDriver::update(mongo::StringData, mongo::mutablebson::Document*, mongo::BSONObj*, mongo::FieldRefSet*, bool*) ()
#4  0x0000559a5e3ada21 in mongo::UpdateStage::transformAndUpdate(mongo::Snapshotted<mongo::BSONObj> const&, mongo::RecordId&) ()
#5  0x0000559a5e3af177 in mongo::UpdateStage::doWork(unsigned long*) ()
#6  0x0000559a5e37b013 in mongo::PlanStage::work(unsigned long*) ()
#7  0x0000559a5e68155a in mongo::PlanExecutor::getNextImpl(mongo::Snapshotted<mongo::BSONObj>*, mongo::RecordId*) ()
#8  0x0000559a5e681e7b in mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::RecordId*) ()
#9  0x0000559a5e681fad in mongo::PlanExecutor::executePlan() ()
#10 0x0000559a5e4e0a54 in mongo::performUpdates(mongo::OperationContext*, mongo::UpdateOp const&) ()
#11 0x0000559a5e3068bf in mongo::CmdUpdate::runImpl(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#12 0x0000559a5e30223c in mongo::(anonymous namespace)::WriteCommand::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&) ()
#13 0x0000559a5e23d846 in mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
#14 0x0000559a5e23eb31 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
#15 0x0000559a5e857490 in mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
#16 0x0000559a5e45cd68 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) ()
#17 0x0000559a5e059ecd in mongo::ServiceEntryPointMongod::_sessionLoop(std::shared_ptr<mongo::transport::Session> const&) ()
#18 0x0000559a5e05a7fd in std::_Function_handler<void (std::shared_ptr<mongo::transport::Session> const&), mongo::ServiceEntryPointMongod::startSession(std::shared_ptr<mongo::transport::Session>)::{lambda(std::shared_ptr<mongo::transport::Session> const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mongo::transport::Session> const&) ()
#19 0x0000559a5ecae2d1 in mongo::(anonymous namespace)::runFunc(void*) ()
#20 0x00007fdc397cd6aa in start_thread (arg=0x7fdc3abc3700) at pthread_create.c:333
#21 0x00007fdc3950313d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

There's a thread trying to create a collection which needs a W lock, so it's blocked behind the above query holding the r lock:

Thread 2 (Thread 0x7fdc2fbe8700 (LWP 31581)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000559a5e313a58 in mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >) ()
#2  0x0000559a5e318abe in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, mongo::Duration<std::ratio<1l, 1000l> >, bool) ()
#3  0x0000559a5e307fd8 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData, mongo::LockMode) ()
#4  0x0000559a5e32b4c7 in mongo::AutoGetOrCreateDb::AutoGetOrCreateDb(mongo::OperationContext*, mongo::StringData, mongo::LockMode) ()
#5  0x0000559a5e4ddaed in mongo::(anonymous namespace)::makeCollection(mongo::OperationContext*, mongo::NamespaceString const&) [clone .constprop.245] ()
#6  0x0000559a5e4e355b in mongo::performInserts(mongo::OperationContext*, mongo::InsertOp const&) ()
#7  0x0000559a5e3066df in mongo::CmdInsert::runImpl(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#8  0x0000559a5e30223c in mongo::(anonymous namespace)::WriteCommand::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, mongo::BSONObjBuilder&) ()
#9  0x0000559a5e23d846 in mongo::Command::run(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
#10 0x0000559a5e23eb31 in mongo::Command::execCommand(mongo::OperationContext*, mongo::Command*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
#11 0x0000559a5e857490 in mongo::runCommands(mongo::OperationContext*, mongo::rpc::RequestInterface const&, mongo::rpc::ReplyBuilderInterface*) ()
#12 0x0000559a5e45cd68 in mongo::assembleResponse(mongo::OperationContext*, mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&) ()
#13 0x0000559a5e059ecd in mongo::ServiceEntryPointMongod::_sessionLoop(std::shared_ptr<mongo::transport::Session> const&) ()
#14 0x0000559a5e05a7fd in std::_Function_handler<void (std::shared_ptr<mongo::transport::Session> const&), mongo::ServiceEntryPointMongod::startSession(std::shared_ptr<mongo::transport::Session>)::{lambda(std::shared_ptr<mongo::transport::Session> const&)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<mongo::transport::Session> const&) ()
#15 0x0000559a5ecae2d1 in mongo::(anonymous namespace)::runFunc(void*) ()
#16 0x00007fdc397cd6aa in start_thread (arg=0x7fdc2fbe8700) at pthread_create.c:333
#17 0x00007fdc3950313d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

All other operations on the db are then stuck behind this.

I suspect the underlying cause is that the update runs for a long time without yielding.

Comment by Craig Leyshan [ 05/Dec/17 ]

I have attached 2 scripts. one inserts a single large document. The other performs a very large update to it which takes a very long time (10 minutes roughly). While this update is happening, you cant do much on the database. You can't insert to another collection, or even retrieve list of collection names. currentOp() says that these other actions are waiting on a lock which seems to be held by the long running update.

Comment by Craig Leyshan [ 04/Dec/17 ]

I will upload the diagnostics now. It was a little over 200MB so i've removed some of the older files from the directory to keep it under 150MB. The erroneous data started arriving to our system on the 29th of November, but perf issues came to a head on the 2nd and 3rd of December UTC. The timestamp of the log line for that sample query in the issue description was: 2017-12-03T00:52:01.932+0000

Comment by Ramon Fernandez Marina [ 04/Dec/17 ]

craig.leyshan@skyfii.com, is it possible to upload the contents of the diagnostic.data directory from the primary node? This directory holds data for about a week, so it should contain stats about what the node was doing during the problematic update.

We can also try to generate a large document that has a shape like the one you describe, but if you happen to have copies or a sample dataset with the dirty data you describe we can try to reproduce this behavior here and see if it's cause by a bug in the server.

Thanks,
Ramón.

Comment by Craig Leyshan [ 04/Dec/17 ]

After a little more investigation, it seems that find is working fine. The issue we had with find is actually the mongo client not being able to render the document, which is not really the issue here. The server can perform the find ok. The actual issue is the updates themselves. The update object was also very large, against a document that was very large. Here's a sample from the mongodb.log:

warning: log line attempted (6849kB) over max size (10kB), printing begin
ning and end ... update skyfii-brazil-reporting.Location_Daily query: { keys.floorId: "1000000409", timestamp: 2017120200 } update: { $inc: { counters.-184650:99
015: 1, counters.-27946560:14868210: 1, counters.-9146310:4866705: 1, counters.-5155785:2743605: 1, counters.-2077680:1105920: 1, counters.-9895755:5265270: 1, c
ounters.-20089576:10688295: 1, counters.-7875720:4190715: 1, counters.-362265:193815: 1, counters.-27103050:14419470: 1, counters.-3981510:2118885: 1, counters.-
1264935:673605: 1, counters.-541695:288780: 1, counters.-1640160:873405: 1, counters.-28140524:14971410: 1, counters.-4207560:2239410: 1, counters.-22725016:1209
0450: 1, counters.-203430:109005: 1, counters.-15651480:8327250: 1, counters.-9046995:4813725: 1, counters.-14174355:7541595: 1, counters.-113700:61320: 1, count
ers.-25394924:13510800: 1, counters.-786525:419250: 1, counters.-23877000:12703275: 1, counters.-284520:152025: 1, counters.-15193320:8083485: 1, counters.-16845
:9780: 1, counters.-2579625:1373190: 1, counters.-1560615:830835: 1, counters.-10777860:5734440: 1, counters.-24485504:13026945: 1, counters.-29618956:15757920:
1, counters.-18176896:9670770: 1, counters.-3540525:1884240: 1, counters.-22374344:11903880: 1
 
... and on it goes, until...
 
} } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keyUpdates:0 writeConflicts:0 numYields:1 locks:{ Global: { acquireCount: { r: 3, w: 3 } }, Database: { acquireCount: { w: 3 }, acquireWaitCount: { w: 1 }, timeAcquiringMicros: { w: 75030 } }, Collection: { acquireCount: { w: 2 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } 11388671ms

As can be seen it took over 3 hours to update a single document. While this was going on, the server was at 200% cpu, the replicas were at 100% cpu, and other operations on other collections were unable to complete.

Generated at Thu Feb 08 04:29:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.