-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.16, 3.4.6, 3.6.0
-
Component/s: Performance
-
Query Optimization
-
ALL
-
-
(copied to CRM)
We had some erroneous data enter our system and get written to a mongo collection. It resulted in a document containing over 600,000 attributes in it. When a find or update was attempted on this document the mongo server master went to 200% CPU and all other work ground to a halt. We attempted to kill the operation but it would not stop. It seemed that the operation was holding locks and was not yielding. Here is a sample of what the currentOp() looked like for the operation:
{ "desc" : "conn17", "threadId" : "140604867651328", "connectionId" : 17, "client" : "172.31.26.193:55908", "active" : true, "opid" : 147314, "killPending" : true, "secs_running" : 532, "microsecs_running" : NumberLong(532401420), "op" : "update", "ns" : "skyfii-brazil-reporting.Location_Daily", "query" : { "keys.floorId" : "1000000409", "timestamp" : NumberLong(2017120200) }, "numYields" : 0, "locks" : { "Global" : "w", "Database" : "w", "Collection" : "w" }, "waitingForLock" : false, "lockStats" : { "Global" : { "acquireCount" : { "r" : NumberLong(1), "w" : NumberLong(1) } }, "Database" : { "acquireCount" : { "w" : NumberLong(1) } }, "Collection" : { "acquireCount" : { "w" : NumberLong(1) } } } }
Note that we have a unique index on the two attributes mentioned in the query so normally these operations take virtually no time.
In the end we had to reboot the server several times, delete the bad data, and filter the bad data before it was written. We are not so concerned that a large document could not be updated/read. Rather the real concern is that:
- other collections also could not be updated at the same time which meant we lost some important data
- we could not terminate the operation without a reboot or kill -9, and
- at one stage we could not connect a client to the server as it was timing out
A bit of large (but not humongous) data in one collection should not bring down a whole database.
Here's is an idea of what one of the large documents looks like:
{ "_id": { "$oid": "5a2219d4892f00b521c545a5" }, "keys": { "floorId": "1000000409" }, "timestamp": { "$numberLong": "2017120200" }, "counters": { "-5037105:2680680": 1, "-28830090:15338265": 1, "1050:330": 93, "-15932715:8477040": 1, "1155:300": 2421, "-16772700:8923710": 1, "-27971520:14881500": 1, "-23240144:12364485": 1, "-15001215:7981305": 1, "-29534580:15713025": 1, "-5454135:2902620": 1, "-11691045:6220425": 1, "-1019970:543465": 1, "-4189770:2229870": 1, "-5274525:2806665": 1, "360:405": 1124, "-1271880:677610": 1, "-15419295:8203785": 1, "-12914220:6871020": 1, .... 600,000-odd entries later .... } }
Background: It's community edition mongo 3.2.16. We briefly tried on a more recent 3.4.9 locally and the same problem happens on that version too. We are running a 3 data node replica set in production managed by mongo cloud.
- is duplicated by
-
SERVER-40043 $set takes quadratic time while holding locks
- Closed
- is related to
-
SERVER-36791 $addToSet with $each takes quadratic time while holding locks
- Backlog
-
SERVER-40043 $set takes quadratic time while holding locks
- Closed