[SERVER-50610] secondary_reads.js should not make assertions based on natural collection ordering Created: 28/Aug/20 Updated: 29/Oct/23 Resolved: 11/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.8.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Louis Williams | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Execution Team 2020-09-21 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 12 | ||||||||||||||||
| Description |
|
The secondary_reads.js FSM test is an insert-only workload that reads from secondaries. It makes assertions that the documents read from the secondary do not contain 'holes', or discontinuities in the documents. The documents are are inserted in increasing order and incremented by values of 1. The expectation is that a collection scan + sort of the documents reveals no gaps in the data. The problem is that this is not guaranteed by non-snapshot reads. When the test uses a readConcern: majority cursor, specifically, the read may use a timestamp T that occurs in the middle of a past-completed oplog batch. This is problematic because documents are not inserted in order on the secondary, and majority reads effectively have read-committed isolation (as do all non-snapshot reads). The cursor may periodically yield and update to read at a newer timestamp, T + N. This introduces the possibility of cursors missing documents that were committed after the initial read timestamp T, and before T + N. As a result, some documents visible at T are returned, and some documents visible at T + N are returned. Here is an example:
In general, this is a problem for the readConcern 'local' and 'available' parts of the test, but because these always occur on batch boundaries (lastApplied), which advances slower than the majority commit point, they seem to be much more unlikely to observe the same problem. The test should not be making assertions about the entire collection's data, since that requires snapshot read isolation. Instead, we should modify the test to have weaker assertions. |
| Comments |
| Comment by Githook User [ 11/Sep/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: |
| Comment by Louis Williams [ 28/Aug/20 ] |
|
After some discussion with daniel.gottlieb, I think we can fix the test by forcing the query to use the {x: 1 } index by building it on the correct collection (it needs to be built on 'this.collName'). This test can make the same assertion if it scans using an index that is ordered (either _id or x). The main problem is that this workload's assertions depend on "natural" ordering that is not guaranteed on secondaries. |