[SERVER-84421] Unexpected flow control engaged during benchmark Created: 27/Dec/23  Updated: 29/Jan/24

Status: Investigating
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: jum zhang Assignee: Edwin Zhou
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screenshot 2024-01-08 at 12.33.31 PM.png     PNG File image-2023-12-27-17-00-25-921.png     PNG File image-2023-12-27-17-02-05-514.png     PNG File image-2023-12-27-17-02-24-056.png     PNG File image-2023-12-27-17-02-40-532.png     PNG File image-2023-12-27-17-04-46-702.png     PNG File image-2023-12-28-19-46-51-844.png     File metrics.2023-12-26T16-10-09Z-00000    
Operating System: ALL
Steps To Reproduce:

Cannot reproduce

Participants:

 Description   

During a benchmark, we found there is many slow log waiting for flow control tickets, we decompress the diagnostic metric data, and find flow control engaged

But we don't find any majority committed lay in corresponding point

According to the configuration of flow control, only replication lay greater than 5s, can flow control engaged

 



 Comments   
Comment by Edwin Zhou [ 08/Jan/24 ]

Thank you for uploading the diagnostic data for the primary node zhangwenjumlovercl@gmail.com.

I took a look at the data and also observed that replication lag reported on the primary did not exceed 5s, so it is unusual that flow control was engaged during your benchmark.

However, we will need diagnostic data from the secondary nodes as well to see how they were behaving leading up to flow control engaging.

Could you also upload diagnostic data from both secondary members 8 and 10 during this time?

Comment by jum zhang [ 28/Dec/23 ]

The metrics files and log file have been uploaded. As i mentioned before, we decomposed the diagnostic metrics file and found that flow control engaged in IOSDate("2023-12-26T16:21:32Z"). There is no flow control related log entries in mongod.log,only slow log entries show that update operations are waitting to hold flow control tickets

Comment by Edwin Zhou [ 27/Dec/23 ]

Hi zhangwenjumlovercl@gmail.com,

For us to further investigate this issue, would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Kind regards,
Edwin

Generated at Thu Feb 08 06:54:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.