[SERVER-16484] Optimize $addToSet oplog generation when possible Created: 09/Dec/14  Updated: 11/Aug/23  Resolved: 11/Aug/23

Status: Closed
Project: Core Server
Component/s: Replication, Write Ops
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Scott Hernandez (Inactive) Assignee: Backlog - Query Optimization
Resolution: Duplicate Votes: 7
Labels: addToSet, push, update
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-72941 Use compact diff format for updates t... Closed
Assigned Teams:
Query Optimization
Participants:
Case:

 Description   

Specifically when $addToSet is equivalent to push to the end of the array the oplog entry can be written more efficiently. 



 Comments   
Comment by Katya Kamenieva [ 11/Aug/23 ]

Fixed in SERVER-72941

Comment by Vinicius Grippa [ 08/Jun/20 ]

Any updates on this matter? 

The change streams feature gets affected because of a huge reslen being transmitted on the network when the Change stream is attached to a sharded collection. This is really amplifying the storage and network requirements. Also, the eviction threads are triggered more frequently because unnecessary data is being loaded in the WT cache.

Comment by Henri-Maxime Ducoulombier [ 04/Dec/18 ]

"I think this ticket is to have $addToSet do the same thing $push does on $push to the end which is to do a $set on appropriate index/ordered element of the array." > That's exactly what I meant.

And sorry, actually $push is written as a $set in the oplog, not a $push, but you summarized it right in your last comment.

This would fix a huge (in my case) I/O / network / load issue when working with large arrays of large sub-documents.

Today, I have replaced the $addToSet with a $push and unicity check using $elemMatch in the query part of the $updates (only bonus here is that field order does not matter, whereas it does when using $addToSet).

Comment by Asya Kamsky [ 03/Dec/18 ]

hmducoulombier@marketing1by1.com we can only record $set and $unset operations in the oplog as the operations must be idemponent and describe the actual change to the document that was performed (not what was requested by the user).

I think this ticket is to have $addToSet do the same thing $push does on $push to the end which is to do a $set on appropriate index/ordered element of the array.

 

Comment by Henri-Maxime Ducoulombier [ 26/Nov/18 ]

This issue is that running a $addToSet update operation on a very large array writes a complete update the array in the OpLog, instead of just pushing the new value to the secondaries (like $push does).

Steps to reproduce :

db.getCollection("test").insert({"_id": "oplogtest", "array": ["a", "z", "e", "r", "t", "y"]});
db.getCollection("test").update({"_id": "oplogtest"}, {"$addToSet": {"array": "x"}});

 
In the oplog, the update operations is not an $addToSet, but a $set, like that :

db.getCollection("test").update({"_id": "oplogtest"}, {"$set": {"array": ["a", "z", "e", "r", "t", "y", "x"]}});

So if "array" is a very large array of say, large documents, it add A LOT of overhead to the oplog and can cause delay (due to network latency and/or I/O on secondaries).

Comment by Asya Kamsky [ 26/Nov/18 ]

Flagging this as without description it's not clear what work is desired here.

Comment by Henri-Maxime Ducoulombier [ 13/Mar/18 ]

I'm bumping this request, this would be a great optimization for oplog when adding elements to very large arrays.

Generated at Thu Feb 08 03:41:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.