[SERVER-32102] Audit tests in change_streams_secondary_reads suite for assumptions of causally consistent getMores Created: 28/Nov/17  Updated: 06/Dec/22  Resolved: 01/Dec/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.0-rc5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: James Wahlin Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-32131 ChangeStreams lookup_post_image.js te... Closed
is related to SERVER-31404 getMore commands should return an err... Backlog
Assigned Teams:
Query
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

There are change streams tests that assume getMore will see previous writes, performed after the cursor is established. For example, this assertion in lookup_post_image.js expects to see a delete performed, but this is not a valid assertion. This suite loads jstests/libs/override_methods/set_read_and_write_concerns.js, so the delete was performed with write concern level "majority", and the cursor was established with read concern level "majority" - but this is not enough to be certain the read can see the write when targeted to a secondary. There are two problems here:

First, the write concern of "majority" will ensure the write is applied to the oplog of the secondary, but will not necessarily wait for it to become part of the majority-committed snapshot on the secondary.

Second, the read concern is only enforced at the time of cursor establishment and any subsequent getMore will not wait for read concern. One idea of how to fix problems like this is to turn on causal consistency, but that wouldn't work for cases like this because of this read concern behavior. A readConcern 'afterClusterTime' would/should not be sent with a getMore, so there's no way to force the getMore to wait until the previous write was visible. This does not affect this particular assertion, since it is inspecting the result of an update lookup which issues a separate query with its own read concern to do the lookup, but this could be an issue elsewhere in this suite.



 Comments   
Comment by Spencer Brody (Inactive) [ 30/Nov/17 ]

I filed SERVER-32131 to fix the specific issue with lookup_post_image.js, but unfortunately don't feel like the replication team has time to perform a full audit of the changestreams tests at the moment. If more build failures like this continue to surface we may have to re-evaluate that, but in the meantime if the query team wants to perform this audit that'd be great, but if not then I think we should close this ticket for now until/unless more issues like this come up.

Comment by Eric Milkie [ 29/Nov/17 ]

LGTM

Comment by Charlie Swanson [ 29/Nov/17 ]

Ok I attempted to clean up the description a bit - how does it look now?

Comment by Eric Milkie [ 29/Nov/17 ]

Yes, that will clear it all up. Thanks!

Comment by Charlie Swanson [ 29/Nov/17 ]

Thanks for that clarification milkie. With regard to you confusion about this sentence:

First, the write concern of "majority" will ensure the write is applied to the oplog of the secondary, but will not necessarily wait for it to become part of the majority-committed snapshot.

Would it make sense if it ended with "on the secondary"? I think that's what we intended.

Comment by Eric Milkie [ 29/Nov/17 ]

First, the write concern of "majority" will ensure the write is applied to the oplog of the secondary, but will not necessarily wait for it to become part of the majority-committed snapshot.

That sentence confuses me. It is talking about both a primary and secondary, but then says "the" majority-committed snapshot. There are different majority-committed snapshots on the primary and the secondary; which one is being referred to here?

Comment by Eric Milkie [ 29/Nov/17 ]

readConcern level majority can indeed block. It blocks when the catalog cache does not match the catalog on disk. This is our stopgap solution to not having a versioned catalog.
For example, what can happen is that an index is dropped. At that point, the index drop code sets a "minimumVisibleSnapshot" on the Collection. Thereafter, any read concern majority reads check this value at the time the Collection is opened (usually when acquiring a read lock on the collection), and if the value is not at the time we are reading, we block and wait for the majority snapshot to move past the minimum snapshot value. After that time, the catalog on disk will match the catalog cache again.
All catalog metadata manipulation commands, including subtle ones like the compact command, set the minimum snapshot value.

Comment by Spencer Brody (Inactive) [ 28/Nov/17 ]

Ooph, I'm not actually sure what that code does. Looks somehow related to index catalog management. milkie should know better. Either way I believe that 99% of the time that should be a no-op.

Comment by Charlie Swanson [ 28/Nov/17 ]

spencer sorry if I'm phrasing it incorrectly - the point of the last paragraph is just to say that we have to watch out for issues like the one seen in BF-6485 where a find, insert, getMore is expected to see the insert in the next getMore. This is not expected, even with causal consistency enabled. Since many of these tests use that pattern (open change stream, do writes, observe writes in getMores), it seems worth a check for that kind of issue.

As an aside - I am confused by this code if there is no 'waiting' for read concern. Can you elaborate on what that process is doing and why it's not considered waiting? Or perhaps this only happens in rare circumstances?

Comment by Spencer Brody (Inactive) [ 28/Nov/17 ]

The last paragraph of the description doesn't make sense to me as there is no "waiting" involved in satisfying readConcern:majority. All readConcern:majority does is inform the storage system what version of documents to return.

Additionally, I believe this issue only applies to tests that are doing multiple writes, then using updateLookup in the stream and expecting the looked-up document to reflect the writes that happened after the update that generated the notification. james.wahlin or charlie.swanson, could one of you clean up the ticket description to make that more clear?

Generated at Thu Feb 08 04:29:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.