[SERVER-82942] Stop hung when the SP is in an error state due to broken Kafka topic Created: 08/Nov/23  Updated: 19/Jan/24  Resolved: 19/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Matthew Normyle Assignee: Matthew Normyle
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: Sprint 36
Participants:

 Description   

https://mongodb.slack.com/archives/C04AH2TF7E1/p1699404224435909

 
// t1: mstreams runs into an error bc the topic was deleted
{"attr":{"error":"Failed to consume with error: Stocks [0]: desired partition is no longer available (Local: Unknown partition)","errorCode":8,"reason":"Failed to consume with error: Stocks [0]: desired partition is no longer available (Local: Unknown partition)","context":{"streamProcessorId":"654ad797ed7403d6ebd69a66","tenantId":"6526b0782f547f6b968f84fa","streamProcessorName":"StockGo3"}},"ctx":"thread2959","_p":"F","time":"2023-11-08T00:37:58.984264364Z","t":{"$date":"2023-11-08T00:37:58.984+00:00"},"id":75899,"kube":{"container_name":"streams-mstream","node_name":"ip-10-128-90-116.ec2.internal","container_image":"664315256653.dkr.ecr.us-east-1.amazonaws.com/mongohouse/mhouse:6884d554357e3e5589bd6bab0962e8e0c29bf5c4","namespace_name":"streams-prod","labels":{"xgen_kube_cluster_name":"kube-1-us-east-1-aws-cloud","xgen_region":"us-east-1","app.kubernetes.io/version":"6884d554357e3e5589bd6bab0962e8e0c29bf5c4","xgen_environment":"prod","xgen_provider":"aws","xgen_platform":"kube","xgen_app":"streams-spp"}},"stream":"stdout","s":"W","msg":"encountered exception, exiting runLoop(): {error}","c":"STREAMS"}

// t2: Agent issues failed heartbeat
{"heartbeatStatus":"HEARTBEAT_STATUS_FAILED","streamProcessorName":"StockGo3","_p":"F","gitVersion":"6884d554357e3e5589bd6bab0962e8e0c29bf5c4","kube":{"container_name":"streams-spp","node_name":"ip-10-128-90-116.ec2.internal","container_image":"664315256653.dkr.ecr.us-east-1.amazonaws.com/mongohouse/mhouse:6884d554357e3e5589bd6bab0962e8e0c29bf5c4","namespace_name":"streams-prod","labels":{"xgen_kube_cluster_name":"kube-1-us-east-1-aws-cloud","xgen_region":"us-east-1","app.kubernetes.io/version":"6884d554357e3e5589bd6bab0962e8e0c29bf5c4","xgen_environment":"prod","xgen_provider":"aws","xgen_platform":"kube","xgen_app":"streams-spp"}},"level":"info","time":"2023-11-08T00:38:01.069811361Z","stream":"stdout","msg":"sending heartbeat for stream"}

// t3: mstreams receives stop request
{"s":"I","msg":"Stopping stream processor","_p":"F","kube":{"container_name":"streams-mstream","node_name":"ip-10-128-90-116.ec2.internal","container_image":"664315256653.dkr.ecr.us-east-1.amazonaws.com/mongohouse/mhouse:6884d554357e3e5589bd6bab0962e8e0c29bf5c4","namespace_name":"streams-prod","labels":{"xgen_kube_cluster_name":"kube-1-us-east-1-aws-cloud","xgen_region":"us-east-1","app.kubernetes.io/version":"6884d554357e3e5589bd6bab0962e8e0c29bf5c4","xgen_environment":"prod","xgen_provider":"aws","xgen_platform":"kube","xgen_app":"streams-spp"}},"time":"2023-11-08T00:38:01.07003009Z","stream":"stdout","attr":{"reason":"Failed to consume with error: Stocks [0]: desired partition is no longer available (Local: Unknown partition)","context":{"streamProcessorName":"StockGo3","streamProcessorId":"654ad797ed7403d6ebd69a66","tenantId":"6526b0782f547f6b968f84fa"}},"t":{"$date":"2023-11-08T00:38:01.069+00:00"},"c":"STREAMS","id":75911,"ctx":"conn6"}

As Sharan said, there isn't a log on mstreams side indicating the stop went through.


Generated at Thu Feb 08 06:50:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.