[SERVER-29843] Make oplog queries with point equality on ts field use getOplogStartHack Created: 23/Jun/17 Updated: 30/Oct/23 Resolved: 07/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.5.9 |
| Fix Version/s: | 3.6.0-rc4 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Tess Avitabile (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||
| Sprint: | Sharding 2017-10-02, Query 2017-10-23, Query 2017-11-13 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Linked BF Score: | 35 | ||||||||||||||||||||||||||||||||
| Description |
|
Doing point queries on the ts field is slow in the oplog because the oplog does not have any indexes. WT has a hack that allows it to do binary search on the ts field but is only enabled when the query predicate has top level $gt/$gte. This is needed by retryable writes in order to do fast lookup on oplog entries when traversing through the write history within a transaction. |
| Comments |
| Comment by David Bartley [ 04/May/19 ] |
|
For context, we're seeing this on a node that was offline for index builds for a while. There's sufficient oplog runway but the node is stuck because an oplog query (containing a $gte and $lte on the same timestamp, so effectively an $eq) is timing out, presumably because the oplog replay hack isn't being applied. |
| Comment by David Bartley [ 04/May/19 ] |
|
Would it be possible to backport this to 3.4? |
| Comment by Tess Avitabile (Inactive) [ 07/Nov/17 ] |
|
Previously the OplogReplay query option was supported over queries with a $gt or $gte predicate over the ts field. Now the OplogReplay query option is supported over queries with a $gt, $gte, or $eq predicate of the ts field, but the value that ts is compared to is required to be a Timestamp. |
| Comment by Githook User [ 07/Nov/17 ] |
|
Author: {'name': 'Tess Avitabile', 'username': 'tessavitabile', 'email': 'tess.avitabile@mongodb.com'}Message: |
| Comment by Randolph Tan [ 07/Nov/17 ] |
|
tess.avitabile We can do that. But if you have extra bandwidth, that will be helpful as well. I submitted a patch yesterday using your last diff to evergreen. |
| Comment by Tess Avitabile (Inactive) [ 07/Nov/17 ] |
|
There are two TODOs for this work: renctan, should I add the OplogReplay option to this query now that OplogReplay is supported for point equality queries on the ts field? Or should I just remove this TODO and someone can file a ticket for this follow-up work? jack.mulrow, should I enable these tests in the retryable_writes_jscore_passthrough? Or should this be scheduled as follow-up work? |
| Comment by Randolph Tan [ 09/Oct/17 ] |
|
david.storch Hold on on that last comment. I have talked with Mathias and he suggested querying with {$natural: 1} instead to get the oldest oplog entry. |
| Comment by Randolph Tan [ 09/Oct/17 ] |
|
david.storch Is it also possible to lift the restriction for $lt, limit: 1 queries as well? I need to be able differentiate between whether an oplog has been rolled over or rolled back if I cannot find it and I plan to use { $lt: <ts> } to perform this check. |
| Comment by David Storch [ 01/Aug/17 ] |
|
renctan, I did a little bit of digging on this and turned up the original ticket associated with the restriction that oplogReplay queries have a $gt or $gte predicate on the ts field: I think this will "just work" on WiredTiger, but you'll have to make a few more changes for MMAP. In particular, you should probably extract the timestamp from the $gte, $gt, or $eq predicate, and then pass it directly to the OplogStart stage here: This is because an equality predicate as tsExpr will do the wrong thing currently if passed directly into OplogStart. Hopefully that makes sense. I'm guessing that you will want to set a limit of 1 on these queries, since the query execution machinery does not know about the oplog's special semantics. In particular, it does not know that the ts field is unique and monotonically increasing. Therefore, without a limit, once it finds an oplog entry matching the ts it will continue to scan to the end of the oplog. Setting a limit will prevent this. I'm moving this ticket to the sharding team backlog, but keeping it in Needs Triage. Let me know if you need any more help on this, and please send the code review my way. |
| Comment by Ian Whalen (Inactive) [ 14/Jul/17 ] |
|
david.storch reminder to please talk to renctan about how best to implement. |