[SERVER-35282] Reading from unreplicated collections in a transaction has undefined behavior Created: 30/May/18  Updated: 27/Oct/23  Resolved: 07/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-35365 MapReduce temporary inc collections s... Closed
Related
related to SERVER-33115 Prevent writes to unreplicated collec... Closed
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-06-18
Participants:
Linked BF Score: 0

 Description   

Consider a replica set primary with top of oplog = logical clock = commit point = T. A writer that creates an unreplicated collection (in a KVStorageEngine such as WT) takes the following steps:

  1. Acquire locks
  2. Create collection in the _mdb_catalog
  3. Commits the transaction
  4. Runs onCommit handlers, setting the min visible on the collection to T.
  5. Releases locks

A reader starting a new transaction and acquiring a collection takes the following steps:

  1. Open a transaction with read timestamp of T.
  2. Acquire locks (I believe these are quickly interrupted if there's lock contention).
  3. Note the minVisible on the collection = T which is satisfied via the transaction read timestamp of T.

If the reader opens the transaction (R1) before the writer commits (W3) and acquires locks (R2) after the writer releases (W4), the reader will observe the in-memory catalog believing the collection exists, but the storage engine will not (due to the concurrent transactions). In debug builds, this typically manifests as a crash when trying to access any index.

It's unclear how this can manifest with arbitrary queries/sorts/aggregations being performed with the query code consuming this state where the in-memory catalog is inconsistent with the storage engine's recovery unit snapshot.



 Comments   
Comment by Spencer Brody (Inactive) [ 07/Jun/18 ]

While this issue theoretically still exists for any unreplicated collection read in a transaction, we believe that there are no unreplicated collections remaining that it is possible to read in a transaction (since we disallow accessing system collections or collections in the admin, config, and local databases), now that SERVER-35365 is complete. So effectively this issue has gone away, as there is no longer a way to actually trigger it.

Comment by Spencer Brody (Inactive) [ 31/May/18 ]

max.hirschhorn and I just went through all uses of UnreplicatedWritesBlock, and I believe the only collections that could actually be affected by this issue (and aren't already disallowed in transactions by virtue of being system collections or living in the config or local dbs) is mapReduce temp collections. MapReduce writes to collection names that are valid for users to write to and would otherwise be replicated, but all the writes to that collection happen in an UnreplicatedWritesBlock. I'm not sure if there's really any way for the transaction system to know that these collections are special, since neither their name nor anything in the catalog about them are special, it's just the way the system writes to them that is unique. I think the only way to fix this might be to change the collection name that mapReduce writes to to either start with 'system' or be in the local database.

Doing that would also have the advantage of fixing SERVER-27147 while we're at it.

david.storch

Comment by Max Hirschhorn [ 31/May/18 ]

I don't think the current implementation of ReplicationCoordinator::isOplogDisabledFor() is a complete answer to the question of "is this collection replicated?" as the reads and writes inside of a transaction would also be in an entirely separate OperationContext and therefore wouldn't know about UnreplicatedWriteBlock previously being used for that namespace.

Comment by Spencer Brody (Inactive) [ 30/May/18 ]

Yeah, I think Kyle's right. We should forbid all operations, reads or writes, on unreplicated collections inside transactions.

Comment by Kyle Suarez [ 30/May/18 ]

SERVER-33115 added a mechanism to prevent writes to unreplicated collections; perhaps it should be moved to a more general place so that reads are banned, too.

Comment by Eric Milkie [ 30/May/18 ]

I wonder if we should simply prohibit reads to such collections. In the BF, a reader was attempting to read from a temporary unreplicated map/reduce collection; such reads have no real-world use case.

Generated at Thu Feb 08 04:39:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.