[SERVER-28579] Data race involving capped collection truncation and PlanExecutor kill notifications Created: 31/Mar/17  Updated: 06/Dec/17  Resolved: 18/Apr/17

Status: Closed
Project: Core Server
Component/s: Querying, Storage
Affects Version/s: None
Fix Version/s: 3.5.7

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Query 2017-05-08
Participants:

 Description   

There are several events which can cause all active query plan executors to be marked as killed:

  • Collection drop.
  • Database drop.
  • Index drop.
  • Capped collection truncation.

Generally, these events involve the acquisition of a MODE_X lock on the collection, which means that any active queries must have yielded all locks. After obtaining exclusive access to the collection, we iterate the list of registered PlanExecutors and mark them as killed:

https://github.com/mongodb/mongo/blob/r3.5.4/src/mongo/db/catalog/cursor_manager.cpp#L333-L339

This writes to PlanExecutor::_killReason. Whenever a PlanExecutor is used, it first consults PlanExecutor::_killReason. If the kill reason is set, an error is propagated to the caller. This means that the client will receive the appropriate error if, for example, the query's collection is dropped during its execution.

The path for capped collection truncation, however, only requires a MODE_IX lock:

https://github.com/mongodb/mongo/blob/r3.5.4/src/mongo/db/catalog/collection.cpp#L923

This means that the thread calling Collection::cappedTruncateAfter() can be writing to PlanExecutor::_killReason at the same time that the PlanExecutor is reading it!



 Comments   
Comment by Githook User [ 18/Apr/17 ]

Author:

{u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}

Message: SERVER-28579 require MODE_X collection lock in Collection::cappedTruncateAfter()
Branch: master
https://github.com/mongodb/mongo/commit/f221a2484586fd49a18e2774b0ff5d6413b535e6

Comment by David Storch [ 14/Apr/17 ]

It looks like the only callers are already holding at least a MODE_X lock on the collection being truncated. Therefore, Milkie's assessment that the bug is not impactful sounds correct. It also means that we can strengthen the contract for Collection::cappedTruncateAfter() to require that callers hold the collection lock in at least MODE_X.

Comment by Eric Milkie [ 01/Apr/17 ]

Is it correct that we only call cappedTruncateAfter() as part of rollback? If so, there can be no readers of the oplog collection when this happens, so at least the current scope of this bug is not impactful.

Generated at Thu Feb 08 04:18:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.