[SERVER-32007] ExpressionDateFromString::evaluate() is not thread safe Created: 16/Nov/17  Updated: 30/Oct/23  Resolved: 17/Nov/17

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.6.0-rc4
Fix Version/s: 3.6.0-rc5, 3.7.1

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.6
Sprint: Query 2017-12-04
Participants:
Linked BF Score: 0

 Description   

In general, aggregation expressions are owned by a single thread responsible for executing the aggregate command. However, this is not the case for a document validator which uses $expr. A thread running a create or collMod commands sets up the validator. Subsequent writes to the collection can concurrently evaluate the aggregation expression, as part of $expr. Note that this is the only the case for storage engines which support document-level concurrency; on MMAPv1, writes will take an exclusive collection lock and thus will not evaluate the $expr concurrently.

Expression::evaluate() is marked as const and therefore is expected to be thread-safe, but ExpressionDateFromString's evaluate implementation has a subtle data race. This expression holds a subexpression by boost::intrusive_ptr:

https://github.com/mongodb/mongo/blob/3b5249593ad3bdbc8799f6a2f80ba89568f7894a/src/mongo/db/pipeline/expression.h#L879

It passes this intrusive_ptr by value to the makeTimeZone() helper:

https://github.com/mongodb/mongo/blob/3b5249593ad3bdbc8799f6a2f80ba89568f7894a/src/mongo/db/pipeline/expression.cpp#L1376

This causes the intrusive_ptr refcount to be incremented when this function is called, and decremented when the function returns. This increment/decrement pair amounts to mutating the object inside a const method, and it results in a data race. The Expression class inherits from IntrusiveCounterUnsigned in order to interface with instrusive_ptr. IntrusiveCounterUnsigned writes directly to an unsigned integer on increment and decrement:

https://github.com/mongodb/mongo/blob/3b5249593ad3bdbc8799f6a2f80ba89568f7894a/src/mongo/util/intrusive_counter.h#L86

This is in contrast to our RefCountable class, whose refcount is an AtomicUInt32.



 Comments   
Comment by Githook User [ 17/Nov/17 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-32007 Avoid unnecessary intrusive_ptr<Expression> copies in time expression evaluation.

(cherry picked from commit 8e06e18365f4efe772c924fa377eaaa0f1ddd219)
Branch: v3.6
https://github.com/mongodb/mongo/commit/a7104648fa1d9b020ba7e4d82dffb96caa681a6c

Comment by Githook User [ 17/Nov/17 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-32007 Make evaluation of time-related agg expressions thread safe.

(cherry picked from commit 411853747b6c5ee4e6d0f5d8ef063514e54d7ee0)
Branch: v3.6
https://github.com/mongodb/mongo/commit/360798e804b12983519c600a9e0b43d066ba8a4d

Comment by Githook User [ 17/Nov/17 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-32007 Avoid unnecessary intrusive_ptr<Expression> copies in time expression evaluation.
Branch: master
https://github.com/mongodb/mongo/commit/8e06e18365f4efe772c924fa377eaaa0f1ddd219

Comment by Githook User [ 17/Nov/17 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-32007 Make evaluation of time-related agg expressions thread safe.
Branch: master
https://github.com/mongodb/mongo/commit/411853747b6c5ee4e6d0f5d8ef063514e54d7ee0

Comment by David Storch [ 16/Nov/17 ]

This bug has been observed to cause a crash for ExpressionDateFromString in our continuous integration system. It affects all of the 3.6 release candidates, though since $expr and ExpressionDateFromString are new, it does not affect old branches. The crash is likely a result of the data race causing the Expression to be de-allocated when there is still an outstanding reference.

It appears from the code that this affects other expressions, not just ExpressionDateFromString. At the very least, all of the evaluate() methods that call makeTimeZone() are likely to be affected. It's possible that there are other Expression::evaluate() implementations that are subtly non-const due to writes to the intrusive_ptr refcount. Any such expression falls under the purview of this ticket as well.

Finally, we should investigate whether there is a similar problem affecting any of the features which can persist query expressions:

  • Partial indexes.
  • Read-only non-materialized views.
Generated at Thu Feb 08 04:28:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.