[SERVER-81108] Sharded $search fails tassert in writeQueryStats Created: 15/Sep/23  Updated: 09/Nov/23  Resolved: 09/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Maddie Zechar Assignee: Maddie Zechar
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-85096 TRACKING: M3 Correctness Tickets Closed
Assigned Teams:
Query Integration
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: QI 2023-11-13
Participants:

 Description   

link to current test: https://github.com/10gen/mongo-enterprise-modules/blob/a5bf26ac4e16152bbab03ee1f24266042b866b93/jstests/search/sharded_search_query_stats.js

 

The above sharded $search test fails when queryStats is enabled (but passes when it is not enabled). Relevant snippet of backtrace is: 

"C":"mongo::tassertFailed(mongo::Status const&, mongo::SourceLocation)","s+":"116"},{"a":"7FCCFEBF216A","b":"7FCCFEB14000","o":"DE16A","s":"_ZZN5mongo11query_stats15writeQueryStatsEPNS_16OperationContextEN5boost8optionalImEESt10unique_ptrINS0_12KeyGeneratorESt14default_deleteIS7_EEmmmENK3$_7clEv","C":"mongo::query_stats::writeQueryStats(mongo::OperationContext*, boost::optional<unsigned long>, std::unique_ptr<mongo::query_stats::KeyGenerator, std::default_delete<mongo::query_stats::KeyGenerator> >, unsigned long, unsigned long, unsigned long)::$_7::operator()() const","s+":"5A"},{"a":"7FCCFEBF1B5B","b":"7FCCFEB14000","o":"DDB5B","s":"_ZN5mongo11query_stats15writeQueryStatsEPNS_16OperationContextEN5boost8optionalImEESt10unique_ptrINS0_12KeyGeneratorESt14default_deleteIS7_EEmmm","C":"mongo::query_stats::writeQueryStats(mongo::OperationContext*, boost::optional<unsigned long>, std::unique_ptr<mongo::query_stats::KeyGenerator, std::default_delete<mongo::query_stats::KeyGenerator> >, unsigned long, unsigned long, unsigned long)","s+":"18B"},{"a":"7FCD07DCFB9D","b":"7FCD06C00000","o":"11CFB9D","s":"_ZN5mongo12ClientCursor7disposeEPNS_16OperationContextEN5boost8optionalINS_6Date_tEEE","C":"mongo::ClientCursor::dispose(mongo::OperationContext*, boost::optional<mongo::Date_t>)","s+":"1FD"},{"a":"7FCD07DFFB75","b":"7FCD06C00000","o":"11FFB75","s":"_ZN5mongo13CursorManager14_destroyCursorEPNS_16OperationContextESt10unique_ptrINS_12ClientCursorENS4_7DeleterEE","C":"mongo::CursorManager::_destroyCursor(mongo::OperationContext*, std::unique_ptr<mongo::ClientCursor, mongo::ClientCursor::Deleter>)","s+":"85"},{"a":"7FCD07DFDE32","b":"7FCD06C00000","o":"11FDE32","s":"_ZN5mongo13CursorManager26deregisterAndDestroyCursorEONS_11PartitionedIN4absl12lts_2021110213node_hash_mapIxPNS_12ClientCursorENS3_13hash_internal4HashIxEESt8equal_toIxESaISt4pairIKxS6_EEEENS_11PartitionerIxEEE12OnePartitionEPNS_16OperationContextESt10unique_ptrIS5_NS5_7DeleterEE","C":"mongo::CursorManager::deregisterAndDestroyCursor(mongo::Partitioned<absl::lts_20211102::node_hash_map<long long, mongo::ClientCursor*, absl::lts_20211102::hash_internal::Hash<long long>, std::equal_to<long long>, std::allocator<std::pair<long long const, mongo::ClientCursor*> > >, mongo::Partitioner<long long> >::OnePartition&&, mongo::OperationContext*, std::unique_ptr<mongo::ClientCursor, mongo::ClientCursor::Deleter>)","s+":"A2"},{"a":"7FCD07DFFFA7","b":"7FCD06C00000","o":"11FFFA7","s":"_ZN5mongo13CursorManager10killCursorEPNS_16OperationContextEx","C":"mongo::CursorManager::killCursor(mongo::OperationContext*, long long)","s+":"3C7"},{"a":"7FCCE4241911","b":"7FCCE3C00000","o":"641911","s":"_ZN5mongo14KillCursorsCmd12doKillCursorEPNS_16OperationContextERKNS_15NamespaceStringEx","C":"mongo::KillCursorsCmd::doKillCursor(mongo::OperationContext*, mongo::NamespaceString const&, long long)","s+":"201"},{"a":"7FCCE423FDE5","b":"7FCCE3C00000","o":"63FDE5","s":"_ZN5mongo18KillCursorsCmdBaseINS_14KillCursorsCmdEE10Invocation8typedRunEPNS_16OperationContextE","C":"mongo::KillCursorsCmdBase<mongo::KillCursorsCmd>::Invocation::typedRun(mongo::OperationContext*)","s+":"125"},{"a":"7FCCE4240AFB","b":"7FCCE3C00000","o":"640AFB","s":"_ZN5mongo12TypedCommandINS_18KillCursorsCmdBaseINS_14KillCursorsCmdEEEE14InvocationBase13_callTypedRunEPNS_16OperationContextE","C":"mongo::TypedCommand<mongo::KillCursorsCmdBase<mongo::KillCursorsCmd> >::InvocationBase::_callTypedRun(mongo::OperationContext*)","s+":"2B"},{"a":"7FCCE42409BC","b":"7FCCE3C00000","o":"6409BC","s":"_ZN5mongo12TypedCommandINS_18KillCursorsCmdBaseINS_14KillCursorsCmdEEEE14InvocationBase8_runImplESt17integral_constantIbLb0EEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","C":"mongo::TypedCommand<mongo::KillCursorsCmdBase<mongo::KillCursorsCmd> >::InvocationBase::_runImpl(std::integral_constant<bool, false>, mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*)","s+":"5C"}"C":"mongo::TypedCommand<mongo::KillCursorsCmdBase<mongo::KillCursorsCmd> >::InvocationBase::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*)","s+":"25"},"C":"mongo::CommandHelpers::runCommandInvocation(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::rpc::ReplyBuilderInterface*)",

It appears killCursors are called, then writeQueryStats is called but keyGenerator is nullptr and thus fails the tassert.



 Comments   
Comment by Githook User [ 09/Nov/23 ]

Author:

{'name': 'madelinezec', 'email': 'mez2113@columbia.edu', 'username': 'madelinezec'}

Message: SERVER-81108 Sharded $search fails tassert in writeQueryStats
Branch: master
https://github.com/mongodb/mongo/commit/d0de879138006c27978e5931797ff12f2b3c979e

Comment by Maddie Zechar [ 01/Nov/23 ]

When we enable queryStats on the shards of a sharded cluster, a cursor will be created for each shard. However, we make the queryStatsKey and put in behind a unique_ptr on CurOp. The ClientCursor constructor std::moves the queryStatsKey (which makes the key on CurOp then null) and copies over the queryStatsKeyHash as the latter is a cheap copy. 

 

So in this way, the two shards are racing to call the ClientCursor constructor first, as the first one will own the key and have a copy of the hash while the second one will only have a copy of the hash. When mongos calls to dispose the cursors, it's only checked if the hash exists on the cursor before calling writeQueryStats. For the cursor that was constructed second, that check passes and it moves onto writeQueryStats() where it trips the tassert.

Comment by Charlie Swanson [ 01/Nov/23 ]

Chatting with maddie.zechar@mongodb.com about this, we realized that this test is failing in a configuration we don't expect to support: the shards are configured to track query stats, but the mongos is not. It does not reproduce if only mongos is configured.

I think this is still worth looking into and should be considered a bug, but I will move it to a later milestone in the project to reflect this lower priority. 

Generated at Thu Feb 08 06:45:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.