[SERVER-26665] Clean up WTRecordStore handling of non-oplog capped collections Created: 17/Oct/16 Updated: 04/Dec/23 |
|
| Status: | Blocked |
| Project: | Core Server |
| Component/s: | Internal Code, Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Storage Execution
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Description |
|
That code dates from a time when we thought that concurrent writes to non-oplog capped collections made sense. Now that we've realized that it doesn't we can significantly simplify the code. All handling of out-of-order commits should only be for the oplog case and not for normal capped collections. |
| Comments |
| Comment by Daniel Gottlieb (Inactive) [ 03/Feb/22 ] |
|
My point was I don't know how an application can reliably resume a tailable cursor even on the same* node. Unless the application is willing to table scan from the start to get to its last seen document each time. Happy to be wrong here, I'm not fully aware of how query interacts with non-oplog capped collections. Capped collections do serve a purpose when a deployment wants to bound disk usage for a collection such as logs or metrics. I'm not familiar with us developing an analogous behavior elsewhere. |
| Comment by Mathias Stearn [ 03/Feb/22 ] |
I think that is basically the only useful feature left in capped collections at this point that you can't get from non-capped collections. If you don't need the ability to resume reading a stream, then you probably shouldn't be using capped collections. |
| Comment by Daniel Gottlieb (Inactive) [ 03/Feb/22 ] |
|
max.hirschhorn showed me we do serialize writes during oplog application. So primaries and secondaries should agree on order today – existing differences were just due to the perceived size of the capped collection. Apologies for the error. Clustered indexes not-withstanding, I don't think we can have nodes agree on oplog order and RecordId order for their capped collection inserts (unless we intend to replicate RecordIds – which I've heard discussed in other contexts). While I'm unsure the benefit of unserializing writes to capped collections outweighs the complexities, I'm also not convinced that primaries and secondaries disagreeing on record id order is a deal breaker. Not to dismiss the change entirely, it would certainly be noteworthy. As far as I can tell, a difference in record id ordering would mean that an application cannot resume a tailable cursor across nodes and expect to see all records. But I don't believe capped collections cursors are reliably resumable by an application today? |
| Comment by Daniel Gottlieb (Inactive) [ 03/Feb/22 ] |
I don't think serializing writes on the primary today gives us ordering on secondaries, as secondaries can reorder inserts to the same collection. Evidenced (historically as noted) by primaries and secondaries choosing different orders for deletion. I don't believe allowing concurrent inserts on a primary would change the existing behavior in any meaningful way? Modulo a solution for tailable cursors. |
| Comment by Mathias Stearn [ 03/Feb/22 ] |
|
That solves the biggest issue with unserialized writes then. We would probably also want to ensure that the order on primaries and secondaries is the same since unlike for normal collections, order is considered significant in capped collections. |
| Comment by Eric Milkie [ 03/Feb/22 ] |
|
We explicitly replicate capped deletes now, so primary and secondaries stay in lock step with their capped deletions. |
| Comment by Eric Milkie [ 03/Feb/22 ] |
|
I see, you would have to change the behavior of tailable cursors to be read-concern-majority then. I think that would be palatable to users, and we'd get a performance boost to capped collection writes as well. |
| Comment by Mathias Stearn [ 03/Feb/22 ] |
|
milkie I think that is what this ticket was initially about. We have all of the logic to support tailable cursors on capped collections with unserialized inserts (or at least what was necessary at the time), however all of that code became unnecessary when we decided to serialize inserts. The real issue, and the reason that we decided to serialize was that without serialization, the primary and secondary would choose different documents to be the "oldest" when deleting old documents. I don't know if we've done anything to solve that since then, so maybe that is no longer an issue. |
| Comment by Louis Williams [ 03/Feb/22 ] |
|
milkie, yes, if we do SERVER-18934 to implement generic capped collection visibility outside the storage engine, then we get this for free. |
| Comment by Eric Milkie [ 03/Feb/22 ] |
|
If we stop serializing capped collection writes, that would make tailable cursor implementation more challenging; do we have a plan for that or would we stop supporting them? |
| Comment by Louis Williams [ 01/Feb/22 ] |
|
redbeard0531, does the motivation in this ticket still hold? We think we may want to reconsider serializing capped collection writes in the future. |