[SERVER-69948] Prevent entries with outdated txnNum entries from creating config.image_collection documents Created: 23/Sep/22 Updated: 29/Oct/23 Resolved: 19/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Christopher Caplinger | Assignee: | Christopher Caplinger |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Sprint: | Server Serverless 2022-10-17, Server Serverless 2022-10-31 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
|
| Comments |
| Comment by Githook User [ 18/Oct/22 ] |
|
Author: {'name': 'Christopher Caplinger', 'email': 'christopher.caplinger@mongodb.com', 'username': 'UnicodeSnowman'}Message: |
| Comment by Daniel Gottlieb (Inactive) [ 27/Sep/22 ] |
|
I don't know this well enough to confidently claim option two "would just work", but I prefer that approach. I imagine it'd be a simple thing to implement and throw up a patch build and see what signal we get from it. edit I should refresh webpages before adding a post-lunch comment |
| Comment by Jason Chan [ 27/Sep/22 ] |
|
The second path seems reasonable to me and the server change itself shouldn't be too hard. The idea will be to modify DocumentSourceFindAndModifyImageLookup so that instead of returning a no-op when we fail to look up the corresponding image entry in the donor replica set, we transform the document by stripping the needsRetryImage field. Testing should be hopefully straightforward as well with unit testing, some of which already exist. I think for completion, we should consider also adding a jstest so we can verify the behavior that no image entries get generated on the recipient replica set. This will be harder to write as we would need to synchronize the user writes with the reads from the tenant oplog fetcher on the donor so that the txnNumber processed by the fetcher at the time becomes stale. |
| Comment by Christopher Caplinger [ 27/Sep/22 ] |
|
Spoke with didier.nadeau@mongodb.com and suganthi.mani@mongodb.com about this yesterday and it seems like we have a couple of options here:
Note, the second option will not only fix the (admittedly rare) test failure, but will resolve the underlying issue and prevent any future confusion if/when this happens in a production environment. The consensus on the serverless team is to go with the second option above, but I'm not personally sure how much effort will be involved here, but will likely involve some more specific scheduling concerns to actually do the work. cc jason.chan@mongodb.com and daniel.gottlieb@mongodb.com for thoughts/opinions on a fix for this since you guys have some context. |
| Comment by Steven Vannelli [ 26/Sep/22 ] |
|
Keeping this in Needs Scheduling until Chris and suganthi.mani@mongodb.com talk about the solution. |