[SERVER-42232] Adding a new shard renders all preceding resume tokens invalid Created: 15/Jul/19 Updated: 29/Oct/23 Resolved: 18/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.11, 4.2.0-rc4, 4.3.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bernard Gorman | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Backport Requested: |
v4.2, v4.0
|
||||||||||||||||
| Sprint: | Query 2019-07-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||
| Description |
|
In DocumentSourceShardCheckResumability, we verify that the first entry in each shard's oplog precedes the resume token in order to guarantee that the resumed stream does not skip any events. If we are resuming from a point in time before one of the shards in the cluster was added, then the first entry in that shard's oplog will always be later than the resume token, and will always fail this check. This renders the stream unresumable from any point before the shard was added. |
| Comments |
| Comment by Githook User [ 19/Jul/19 ] |
|
Author: {'name': 'Bernard Gorman', 'username': 'gormanb', 'email': 'bernard.gorman@gmail.com'}Message: (cherry picked from commit ffdb59938db0dfc8ec48e8b74df7a54d07b3a128) |
| Comment by Githook User [ 19/Jul/19 ] |
|
Author: {'name': 'Bernard Gorman', 'username': 'gormanb', 'email': 'bernard.gorman@gmail.com'}Message: (cherry picked from commit ffdb59938db0dfc8ec48e8b74df7a54d07b3a128) |
| Comment by Githook User [ 18/Jul/19 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: |
| Comment by Bernard Gorman [ 16/Jul/19 ] |
|
schwerin: yes, setting an initial timestamp to the dawn of time would also work - but as you say, that logic would need to be backported to at least release N-1 in case we initiate a set and then upgrade it. The "initiating set" entry will function exactly the same for this purpose, and it already exists in the same form in every release as far back as 2.0.0. |
| Comment by Andy Schwerin [ 16/Jul/19 ] |
|
bernard.gorman, could this be fixed if replica sets were always initiated with a timestamp at the beginning of time, say Timestamp(1, 1)? That way, any change stream request that hit a recently added shard would see that the oplog on the new shard went back to the dawn of time. Edit The alternative solution, which I believe Bernard has in mind, is to treat the "replica set initiate" oplog entry as a sentinel whose semantic meaning is "there are no older writes that this one." That's probably a more flexible solution, since you could use that solution by upgrading binaries even if the shards involved had been created with older binaries. |
| Comment by Charlie Swanson [ 16/Jul/19 ] |
|
bernard.gorman assigning this to you as discussed. |