[SERVER-33812] First initial sync oplog read batch fetched may be empty; do not treat as an error. Created: 12/Mar/18 Updated: 29/Oct/23 Resolved: 18/Apr/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.6, 3.7.6 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Geert Bosch | Assignee: | Benety Goh |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v3.6, v3.4
|
||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
Cherry-picking f23bcbfa6d08c24b5570b3b29641f96babfc6a34 onto v3.6 also reproduces the bug with the RHEL-62 enterprise builder (the required one on evergreen), though I haven't been able to reproduce locally without inserting extra delays. |
||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-04-23 | ||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 52 | ||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Currently we depend on an initial sync not receiving an initial batch that is empty. However, this is a possibility, depending on timing in the _oplogJournalThreadLoop. The fix for |
| Comments |
| Comment by Siyuan Zhou [ 20/Aug/19 ] | ||||||||||||||||
|
| ||||||||||||||||
| Comment by Siyuan Zhou [ 19/Aug/19 ] | ||||||||||||||||
|
milkie, when working on Some ideas to fix this:
None of them sounds satisfying to me. Ideally, the sync source only waits for all oplog slots to be filled rather than waiting for afterClusterTime. In terms of rollback, the failures of Additionally, it would be nice if the syncing node sends both the timestamp and the term to the sync source so choosing sync source can be simpler and checking for rollbackId is no longer needed. This proposal is out of scope but may affect the trade-off of the solutions. | ||||||||||||||||
| Comment by Githook User [ 22/May/18 ] | ||||||||||||||||
|
Author: {'username': 'benety', 'name': 'Benety Goh', 'email': 'benety@mongodb.com'}Message: This avoids issues with the oplog query being rejected in mixed version (cherry picked from commit 46e2583c36856e0d377fcb35f2208a0ac516f031) | ||||||||||||||||
| Comment by Githook User [ 02/May/18 ] | ||||||||||||||||
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: Revert " This reverts commit 5e10e0f84bfc225686547b732fec0d5fb104a6f8. | ||||||||||||||||
| Comment by Githook User [ 24/Apr/18 ] | ||||||||||||||||
|
Author: {'email': 'benety@mongodb.com', 'username': 'benety', 'name': 'Benety Goh'}Message: (cherry picked from commit 46e2583c36856e0d377fcb35f2208a0ac516f031) | ||||||||||||||||
| Comment by Githook User [ 24/Apr/18 ] | ||||||||||||||||
|
Author: {'email': 'benety@mongodb.com', 'username': 'benety', 'name': 'Benety Goh'}Message: roll_back_local_operations_test provides coverage for this error condition. (cherry picked from commit 91fc5673cab5d1267fd805f1375577df9072ea1b) | ||||||||||||||||
| Comment by Githook User [ 18/Apr/18 ] | ||||||||||||||||
|
Author: {'email': 'benety@mongodb.com', 'username': 'benety', 'name': 'Benety Goh'}Message: | ||||||||||||||||
| Comment by Benety Goh [ 17/Apr/18 ] | ||||||||||||||||
|
Instead of handling the case for an empty initial batch returned by the query, we will address the issue by always providing a read concern with the afterClusterTime. We can safely remove the term comparison because
| ||||||||||||||||
| Comment by Githook User [ 17/Apr/18 ] | ||||||||||||||||
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: roll_back_local_operations_test provides coverage for this error condition. | ||||||||||||||||
| Comment by Githook User [ 13/Apr/18 ] | ||||||||||||||||
|
Author: {'email': 'benety@mongodb.com', 'name': 'Benety Goh', 'username': 'benety'}Message: This test checks the metadata handling in the query results and should | ||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 13/Mar/18 ] | ||||||||||||||||
|
Is this only being observed when a node is querying the oplog of an active secondary? I'm only guessing here, but is it possible that a node doing a GTE query in steady-state chose a server based on that servers reported last applied (or some other replication gossiped value), but when performing a query, the first batch returned is empty because visibility hasn't caught up? If so, I'm optimistic that | ||||||||||||||||
| Comment by Eric Milkie [ 13/Mar/18 ] | ||||||||||||||||
|
You’re correct. I misspoke earlier. The first batch is entirely dependent on the query, regardless of awaitdata or tailable. | ||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 13/Mar/18 ] | ||||||||||||||||
|
I think I understand why relying on the batch not being empty is problematic in initial sync, because as you say we decide what optime to start our query from by doing a reverse scan which doesn't respect oplog visibility. For steady state, however, your assertion that any tailable awaitData query can return an empty batch doesn't seem right/safe. In steady state, getting an empty first batch of a new oplog tailing GTE query is used to indicate a need to rollback, as the optime queried for in the GTE is the last one we got on our previous batch, and thus must be visible. If we got an empty batch in this case when there were actual oplog documents that matched our query, that would be wrong and would trigger unnecessary rollbacks. | ||||||||||||||||
| Comment by Eric Milkie [ 13/Mar/18 ] | ||||||||||||||||
|
Note that replication is formulating the tailable/awaitdata initial sync "find" query by specifying an oplog entry that is not yet visible, because it discovered this entry by doing a reverse-cursor find on the oplog. Reverse collection scans ($natural: -1) are not subject to oplog visibility rules. Therefore, the awaitdata query cannot be expected to always return data within the specified awaitdata timeout. | ||||||||||||||||
| Comment by Eric Milkie [ 13/Mar/18 ] | ||||||||||||||||
|
The semantics of the AwaitData query parameter combined with query parameter Tailable are that any batch can return empty, including the first one. | ||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 13/Mar/18 ] | ||||||||||||||||
|
It seems like erroring if the oplog batch is empty is correct behavior. The question in my mind is why is the query getting an empty batch in the first place. geert.bosch? | ||||||||||||||||