[SERVER-8193] Optimize in place updates that modify an index (not to deteriorate based on size of a different indexed field) Created: 16/Jan/13  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance, MMAPv1, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Aaron Staple Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 30
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2014-01-31 11.17.24.png    
Issue Links:
Duplicate
is duplicated by SERVER-6904 Inner Collection (w/ > 10000 entries)... Closed
Related
is related to SERVER-6399 Refactor update() code Closed
is related to SERVER-8192 Optimize btree key generation Closed
Assigned Teams:
Query
Participants:

 Description   

When a non in place update occurs, the update implementation extracts all index keys for all index fields from both the old and updated versions of the document. For a large number of index keys this can be expensive. And it may not always be necessary if there are cases where it can be determined from the update mod spec that a given index's key set will not be modified.



 Comments   
Comment by Mathieu Poumeyrol [ 02/Jun/14 ]

The quadratic bit may have gone. But the update-all-indexes-or-nothing behaviour is still there in 2.6.1 and basically lives here:

https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_driver.cpp#L349

Comment by Davide Italiano [ 31/Jan/14 ]

My benchmarks confirms the problem should be gone away on trunk (githash https://github.com/mongodb/mongo/commit/994103b66cbd41ba9169d628422e496e8cbd24c9). Screenshot of results of mongo-perf test scenario (on multi-db) attached.

Comment by A. Jesse Jiryu Davis [ 02/Apr/13 ]

Note that Asya's timings appear very consistent with quadratic complexity: ms = 0.00012 * (number of subdocs squared), with error less than 10-15%.

Comment by Asya Kamsky [ 20/Mar/13 ]

I think my comment above may not quite match the title of the bug - these are in place updates but they are modifying a different index than the one that is directly causing slow updates. SERVER-8192 would help, but here is the case where not calling getKeys on multikey subdocument array index would result in much faster update performance even if getKeys() is not itself optimized.

Comment by Asya Kamsky [ 20/Mar/13 ]

My reproducer:
Documents have _id, array of subdocuments and another numerical field x which I set to 1 on all documents. There are 200 documents.
There is an _id index and an index on subdocument.id field.
The size of the array goes up by 20 per _id value so I have

_id: 50     size of array: 1000
_id: 100    size of array: 2000
_id: 150    size of array: 3000
_id: 200    size of array: 4000

When x is not indexed, I see the same performance on all updates, regardless of the size

test@local(2.4.0-rc2) > db.idxgrow.update({_id:50},{$inc:{x:1}})
Updated 1 existing record(s) in 1ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:100},{$inc:{x:1}})
Updated 1 existing record(s) in 1ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:150},{$inc:{x:1}})
Updated 1 existing record(s) in 0ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:200},{$inc:{x:1}})
Updated 1 existing record(s) in 1ms

When I add an index on x I get:

test@local(2.4.0-rc2) > db.idxgrow.ensureIndex({x:1})
Inserted 1 record(s) in 29ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:50},{$inc:{x:1}})
Updated 1 existing record(s) in 118ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:50},{$inc:{x:1}})
Updated 1 existing record(s) in 119ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:50},{$inc:{x:1}})
Updated 1 existing record(s) in 123ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:100},{$inc:{x:1}})
Updated 1 existing record(s) in 440ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:100},{$inc:{x:1}})
Updated 1 existing record(s) in 432ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:100},{$inc:{x:1}})
Updated 1 existing record(s) in 449ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:150},{$inc:{x:1}})
Updated 1 existing record(s) in 1089ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:150},{$inc:{x:1}})
Updated 1 existing record(s) in 964ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:150},{$inc:{x:1}})
Updated 1 existing record(s) in 964ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:200},{$inc:{x:1}})
Updated 1 existing record(s) in 1686ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:200},{$inc:{x:1}})
Updated 1 existing record(s) in 1699ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:200},{$inc:{x:1}})
Updated 1 existing record(s) in 1659ms

I have a single item that has 20,000 subdocuments in the array and the update is:

test@local(2.4.0-rc2) > db.idxgrow.update({_id:ObjectId("513495f0832f741ec9256d30")},{$inc:{x:1}})
Updated 1 existing record(s) in 43397ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:ObjectId("513495f0832f741ec9256d30")},{$inc:{x:1}})
Updated 1 existing record(s) in 44236ms
test@local(2.4.0-rc2) > db.idxgrow.update({_id:ObjectId("513495f0832f741ec9256d30")},{$inc:{x:1}})
Updated 1 existing record(s) in 45257ms

Generated at Thu Feb 08 03:16:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.