[SERVER-34977] subtract capped deletes from fastcount during replication recovery Created: 14/May/18  Updated: 29/Oct/23  Resolved: 12/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.0-rc6, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-35435 Renaming during replication recovery ... Closed
related to SERVER-35483 rollback makes config.transactions fa... Closed
related to SERVER-35052 Turn off fastcount checks on capped c... Closed
is related to SERVER-34976 clear the "needing size adjustment" s... Closed
is related to SERVER-52833 Capped collections can contain too ma... Closed
is related to SERVER-35431 rollback does not correct sizeStorer ... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-06-04, Repl 2018-06-18
Participants:
Linked BF Score: 65

 Description   

The following "algebra" should explain what's happening currently. One question is, do capped deletes get undone when we recover to a timestamp. Assuming they do get undone, we have the following.

 
itcount and fastcount at stable timestamp = A
Operations between stable timestamp and common point = B 
capped deletes between stable timestamp and common point = Cd1

Operations between common point and top of oplog = Diff
capped deletes between common point and top of oplog = Cd2
itcount and fastcount when rollback begins = A + B + Diff - Cd1 - Cd2
 
— recover to stable timestamp ----
 
itcount = A
fastcount = A + B + Diff - Cd1 - Cd2
 
— replication recovery —
 
Operations during replication recovery = B
capped deletes during replication recovery = Cd3
itcount after replication recovery = A + B - Cd3
fast count = ??? (depending on collections marked for size adjustment)
 
— reset counts after rollback by Diff —
 
itcount = A + B - Cd3
fastcount = A + B + Diff - Cd1 - Cd2 - Diff = A + B - Cd1 - Cd2
 
This is a problem since we have no idea of Cd1 and Cd2. If, however, capped deletes are not undone (i.e. they're not timestamped), then itcount after recover to stable timestamp is A - Cd1 - Cd2, and the itcount after replication recovery = A + B - Cd1 - Cd2 - Cd3.
In that case we can just subtract Cd3 to be correct. It should be safe for capped deletes to not recover to a timestamp since users expect it to be safe for them to get aged out anyways.

 

We can either keep track of the capped deletions and subtract them out, or turn off capped deletion during replication recovery and do it all at once at the end.

 

I think this applies to both rollback and replication recovery during startup, but there may be a reason it doesn't happen at startup.



 Comments   
Comment by Githook User [ 12/Jun/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-34977 SERVER-35435 SERVER-34976 Fix fastcounts for RTT, including capped collections.

(cherry picked from commit 8b698cac2d19f0fec502db10501e7059a10d2897)
Branch: v4.0
https://github.com/mongodb/mongo/commit/0fd5d4eb2e61bbef14f6e55c8e5f9619e807260b

Comment by Githook User [ 12/Jun/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-34977 SERVER-35435 SERVER-34976 Fix fastcounts for RTT, including capped collections.
Branch: master
https://github.com/mongodb/mongo/commit/8b698cac2d19f0fec502db10501e7059a10d2897

Comment by Judah Schvimer [ 05/Jun/18 ]

After talking to daniel.gottlieb, we've realized that this analysis missed a key point, and that is that replication recovery should NEVER do a capped deletion since it is replaying operations we already played forward, and the documents that were deleted do not return at the stable timestamp/checkpoint after rollback/clean shutdown. There are two cases.

The first is where we only truncate documents that were inserted behind the stable timestamp. In that case, when we jump back to the stable timestamp and insert the same documents into the collection again, the documents we should be deleting were already deleted the first time we played those operations. In this case, during replication recovery the data size will be greater than 0 and the collection will not be marked for size adjustment. When the record store checks to see if it should delete records on inserts/updates, it will use the pre-rollback size as the starting point and think that the size is actually bigger than it is. In that case it would try to do capped deletes like we're seeing, even though we shouldn't expect it to.

The other case is where we truncate documents that were inserted during replication recovery. In that case we are guaranteed that the record store will be empty at the stable timestamp and we will mark the collection for size adjustment here. The size then is corrected to 0 appropriately and we will allow capped deletions, which will be accounted for correctly in the count.

We currently do not adjust the data size of a collection after a rollback. Thus the effective size of a capped collection may not actually be the real size of the capped collection. This is a bug we are accepting. A validate will fix this.

Comment by Judah Schvimer [ 17/May/18 ]

Please make sure to remove this block in this ticket to ensure the fastcounts are correct.

Generated at Thu Feb 08 04:38:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.