[SERVER-43167] simplify update and delete replication oplog entry application Created: 04/Sep/19 Updated: 10/May/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Replication
|
||||
| Sprint: | Repl 2020-03-23 | ||||
| Participants: | |||||
| Description |
|
Today, applyOperation_inlock() builds and executes an entire Query plan to do each replicated 'u' update and 'd' delete operation. This method has a lot of overhead that could be avoided. I believe we can avoid the query code entirely for simplification and performance improvements. For update operations, a similar (but a bit more involved) method could be used. |
| Comments |
| Comment by Judah Schvimer [ 06/Apr/20 ] |
|
Putting on the backlog until we prioritize this area for performance optimization. We would expect to have to do profiling to see what the bottlenecks are. |
| Comment by Tess Avitabile (Inactive) [ 31/Mar/20 ] |
|
It sounds good to do a POC for deletes and checkĀ this workload. Although we might need a long-running test to see the performance benefit, I don't want to invest in that now. If we see a large benefit, we can proceed with the work for deletes. If not, I wouldn't want to proceed with the work. I expect the change to make the code more complex, since the query layer is a useful abstraction. And if this isn't low-hanging fruit, then I agree with Judah that we should wait until we want to improve oplog application performance in general, and then profile the entire path. |
| Comment by Judah Schvimer [ 30/Mar/20 ] |
|
Yes. I mean the POC. Per offline discussion, I am proposing that we do this POC when we want to spend time optimizing the oplog application path, and that we profile the entire path to see if the CRUD op's use of the query system is the best area to spend time optimizing. That said, having deletes skip the query system seems small and straightforward enough that we could just do that without extra profiling. |
| Comment by Eric Milkie [ 30/Mar/20 ] |
|
By "this work" do you mean "do a POC and run a profiler to measure the performance gain for delete" as Siyuan suggested for next steps? I don't see how you could demonstrate that a workload shows this to be a bottleneck without doing the POC. |
| Comment by Judah Schvimer [ 30/Mar/20 ] |
|
Do we think this will also simplify the code? If not, I think we should only schedule this work if we have a workload where this is shown to be the bottleneck. |
| Comment by Siyuan Zhou [ 30/Mar/20 ] |
|
Talked with milkie. Eric did a POC in 3.2 and noticed the overhead of code complexity even though id hack is used, but he doesn't have the patch any more. We don't have existing perf results either. We should do a POC and probably run a profiler to measure the performance gain for delete. Some caveats: |
| Comment by Tess Avitabile (Inactive) [ 28/Jan/20 ] |
|
Sounds good, thank you. I think it would be a good idea to just start with deletes and look for a performance change. I'd recommend running a perf patch build and checkingĀ this workload, which runs CRUD operations with w:majority against a 3-node replica set. If we see a performance benefit, we can proceed with this work. I'm not sure we should make this change for updates at all. For $-modifier updates, we need the UpdateStage machinery. For replacement-style updates, we could go directly to the storage engine, but then we would miss the validation done by UpdateStage. I'm also not totally sure we will see a performance benefit. Since this is an _id query, the query system skips forming a canonical query and constructs and id-hack plan. So it would be valuable to do a POC for delete and check for a performance change. |
| Comment by Siyuan Zhou [ 27/Jan/20 ] |
|
Agreed with Judah. I don't have particular concerns as long as one of us is on the code review. |
| Comment by Judah Schvimer [ 27/Jan/20 ] |
|
Historically upsert logic, interactions with the applyOps command, and vectored inserts have been difficult to get right. This also may intersect with |
| Comment by Tess Avitabile (Inactive) [ 27/Jan/20 ] |
|
judah.schvimer and siyuan.zhou, you mentioned that there might be hidden challenges to this ticket. One challenge that I can think of is autoIndexId:false (though I'm not sure whether we allow upgrade with an autoIndexId:false collection). Are there other challenges that come to mind? |