[SERVER-75353] Mongodb v5.0.3 scaling bottleneck/issue Created: 27/Mar/23  Updated: 24/May/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: NOVALUE Uttam Assignee: Backlog - Performance Team
Resolution: Unresolved Votes: 0
Labels: perf-effort-medium, perf-urgency-soon, perf-value-essential, performance, scaling
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File mongodb-scaling-cpu-util.png     PNG File mongodb-scaling-ipc.png     PNG File mongodb-thread-scaling.png     File offcpu.80t.svg    
Issue Links:
Related
Assigned Teams:
Product Performance
Operating System: ALL
Steps To Reproduce:

Use ycsb client,
 
$ python2.7 ./bin/ycsb load mongodb -jvm-args="-Dlogback.configurationFile=../logback.xml" -s -P workloads/workloadc -threads 20 -p mongodb.url=mongodb://localhost/ycsb

{{}}

$ python2.7 ./bin/ycsb run mongodb -jvm-args="-Dlogback.configurationFile=../logback.xml" -s -P workloads/workloadc -threads 80 -p mongodb.url=mongodb://localhost/ycsb

{}No. records: 40M
Type of Request: Read-Only queries

Participants:

 Description   

Hi,

I've been studying the mongodb scaling impact on a high core count machines. I've observed that the server doesn't scale linearly once it crosses 20 threads. This was tested against Mongodb v5.0.3 and YCSB client (readonly tests). Here is my observation,

Clients/Threads 1 2 4 8 10 20 30 40 80
 Throughput per thread           1.00          0.99          0.98          0.94          0.94          0.80          0.59          0.43          0.19

First row is self-explanatory. The second row show per thread throughput over single thread throughput.  Also see the attachment showing the graph. In order to understand the issue, I collected some off-cpu data events. This data showed a lock/pause is the cause of scaling issue in two functions, viz. 1) mongo::ServiceContext::makeOperationContext() and 2) mongo::ServiceContext::__delistOperation(). I've also attached flamegraph showing these bottlenecks.

Unfortunately, I can't contribute to the project with actual code-contribution due to restrictions on my side but I can submit the issues hence this report. I'm hoping someone from the mongodb community will take a look at this issue and provide a solution.

I hope this helps as a a good starting point for further analysis.



 Comments   
Comment by Ger Hartnett [ 24/May/23 ]

Thank you for reporting this Uttam. We expect to start a project soon to investigate this and other similar issues. 

Comment by NOVALUE Uttam [ 27/Mar/23 ]

Here is the content of logback.xml (reduced output)

$ cat logback.xml 
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>
                %d{HH:mm:ss.SSS}  %-5level %logger{36} - %msg%n
            </pattern>
        </encoder>
    </appender>
    <logger name="org.mongodb" level="WARN">
        <appender-ref ref="STDOUT"/>
    </logger>
</configuration>

Generated at Thu Feb 08 06:29:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.