[SERVER-83885] Fix hang that can occur in start due to a failing Kafka connection Created: 05/Dec/23  Updated: 06/Dec/23  Resolved: 06/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Task Priority: Major - P3
Reporter: Matthew Normyle Assignee: Matthew Normyle
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Atlas Streams
Backwards Compatibility: Fully Compatible
Sprint: Sprint 37
Participants:

 Description   

mongostreams stop request is timing out after a minute.
(splunk)looks like it has run into some deadlock. Taking a process dump and then will try to mitigate it.
kb-copy-mongoclient streams-spp-6bc9946bc-8tdkn
kb-bash-mstreams streams-spp-6bc9946bc-8tdkn
MongoDB Enterprise > db.runCommand({"streams_listStreamProcessors": ""})
... this hangs ...

matthew.normyle  7 minutes ago

Got the stack traces with
kb-copygdb streams-spp-6bc9946bc-8tdkn
kb-bash-mstreams streams-spp-6bc9946bc-8tdkn
/tmp/gdb attach 1
 
#0 0x00007f25dcf7174a in pthread_join () from /lib64/libpthread.so.0
#1 0x0000558f2c7ef404 in thrd_join ()
#2 0x0000558f2c6d4b1c in rd_kafka_destroy_app ()
#3 0x0000558f2c6afe3f in RdKafka::ConsumerImpl::~ConsumerImpl() ()
#4 0x0000558f2c62ac27 in streams::KafkaPartitionConsumer::~KafkaPartitionConsumer() ()
#5 0x0000558f2c62adf2 in streams::KafkaPartitionConsumer::~KafkaPartitionConsumer() ()
#6 0x0000558f2c61a3e8 in streams::KafkaConsumerOperator::doStop() ()
#7 0x0000558f2c65cbf1 in streams::OperatorDag::stop() ()
#8 0x0000558f2c604b2a in streams::Executor::stop() ()
#9 0x0000558f2c5e348c in streams::StreamManager::startStreamProcessor(mongo::StartStreamProcessorCommand const&) ()
 
#2 0x0000558f30e1df70 in mongo::latch_detail::Mutex::lock() ()
#3 0x0000558f2c5e285d in streams::StreamManager::stopStreamProcessor(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
#4 0x0000558f2c5d788c in mongo::TypedCommand<streams::StopStreamProcessorCmd>::InvocationBase::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) ()


Generated at Thu Feb 08 06:53:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.