[SERVER-36771] [post-project] Provide a way to know when there are no active transactions in a sharded cluster Created: 20/Aug/18  Updated: 04/Dec/18  Resolved: 04/Dec/18

Status: Closed
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
Related
Sprint: Sharding 2018-11-19, Sharding 2018-12-17
Participants:

 Description   

This is required for Atlas Live Import to be able to maintain transactional integrity.



 Comments   
Comment by Kaloian Manassiev [ 04/Dec/18 ]

Based on a discussion with shane.harvey and judah.schvimer, the transactions.currentOpen metric will be sufficient for figuring out when the system is quiesced of active transactions, therefore there is nothing for the Sharding Team to do.

Comment by Judah Schvimer [ 14/Nov/18 ]

From the description in the linked mongomirror ticket, it looks like mongomirror will buffer prepared transactions. Is this the right thing to do rather than just have mongomirror pass all oplog entries to the destination cluster's primary and let the replication machinery deal with prepare/abort and commit?

Making applyOps work well with prepare would be incredibly difficult due to initial-sync-like idempotency problems that mongomirror's push based initial sync hits. We would have to reconstruct oplog pointers, change "commit timestamps", and log a new type of applyOps so that secondaries know to relax constraints in the same way the primary did. It was deemed far easier to implement this in mongomirror itself.

That way, after mongomirror has applyOp'd all oplog entries to the shards' primaries, it can just log a no-op oplog entry and wait for that to majority propagate. From that point onward the two clusters are identical.

How does this require actually sending prepare rather than buffering transactions? Cross-shard transactions write their own oplog entries even when the user has stopped writing, and it takes time for those oplog entries to reach all secondaries, so simply applying everything that's there as of when the user stops writing is not sufficient. Am I misunderstanding?

Comment by Kaloian Manassiev [ 14/Nov/18 ]

judah.schvimer, shane.harvey, I would like to understand a little bit better about the requirements here.

Presumably, as part of the import process, the source cluster will at some point be quiesced so that mongomirror can catch-up and this ticket is about figuring out when all active transactions have finished being applied there - is this correct?

Like you say judah.schvimer, waiting for zero transactions will not be sufficient, because there can be "lulls" in the oplog where there are no transactions which can mistakenly be interpreted.

From the description in the linked mongomirror ticket, it looks like mongomirror will buffer prepared transactions. Is this the right thing to do rather than just have mongomirror pass all oplog entries to the destination cluster's primary and let the replication machinery deal with prepare/abort and commit?

That way, after mongomirror has applyOp'd all oplog entries to the shards' primaries, it can just log a no-op oplog entry and wait for that to majority propagate. From that point onward the two clusters are identical.

By the way, at that point, how can there be prepared transactions? This would mean that the source cluster was quiesced at the wrong point in time.

Comment by Judah Schvimer [ 20/Aug/18 ]

It may be sufficient to simply wait for every node in the cluster to have 0 active transactions, but if one node is super lagged that may not be sufficient. Maybe we have to wait for the replication lag to actually go to 0 on all nodes or surface a "lastActiveTransactionNumber" field and wait for that to be the same on each node as well. We have to be careful that the nodes can't just roll back the last transaction commit right after the metric threshold is met.

Generated at Thu Feb 08 04:44:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.