[SERVER-71390] Telemetry store read lock triggers an assertion when used in multiple threads Created: 15/Nov/22  Updated: 29/Oct/23  Resolved: 29/Nov/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.2.0-rc2
Fix Version/s: 6.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Jess Balint Assignee: Jess Balint
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-71592 $telemetry stage implementation shoul... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.2
Sprint: QO 2022-11-28, QO 2022-12-12
Participants:
Linked BF Score: 137

 Description   

[j5:c:prim] | 2022-11-15T17:57:43.125+00:00 F ASSERT 23079 [conn26] "Invariant failure","attr":

{"expr":"request->recursiveCount > 0","file":"src/mongo/db/concurrency/lock_manager.cpp","line":539}

mongo::LockerImpl::_lockBegin(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode)
mongo::LockerImpl::lock(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t)
mongo::Lock::ResourceLock::_lock(mongo::LockMode, mongo::Date_t)
mongo::telemetry::getTelemetryStoreForRead(mongo::ServiceContext const*)
mongo::telemetry::(anonymous namespace)::LockedMetrics::get(mongo::OperationContext const*, mongo::BSONObj const&)
mongo::telemetry::recordExecution(mongo::OperationContext const*, mongo::OpDebug const&, bool)
mongo::(anonymous namespace)::FindCmd::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*)



 Comments   
Comment by Githook User [ 28/Nov/22 ]

Author:

{'name': 'Jess Balint', 'email': 'jbalint@gmail.com', 'username': 'jbalint'}

Message: SERVER-71390 locking fix
Branch: master
https://github.com/mongodb/mongo/commit/507288cdeed26adbdab4eccd66cb48b73d96559e

Comment by David Storch [ 16/Nov/22 ]

What's the rationale for the telemetry store having its own LockerImpl instance? I assumed that the telemetry store would work like the plan cache – namely, it would be initialized on process startup before we begin collecting telemetry. During steady-state operation, the concurrency control would be implemented using our Partitioned utility, which under the hood has a vector of mutexes.

Generated at Thu Feb 08 06:18:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.