[CDRIVER-27] non-blocking version (with support for libev) Created: 24/Nov/10  Updated: 16/Nov/21  Resolved: 06/Aug/16

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: None
Fix Version/s: TBD

Type: New Feature Priority: Major - P3
Reporter: Eliot Horowitz (Inactive) Assignee: Backlog - C Driver Team
Resolution: Won't Fix Votes: 9
Labels: internal-woes
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File cmds-per-thread-per-second.png     File thread-loadtest-client.c     File thread-loadtest-server.c    
Issue Links:
Depends
depends on CDRIVER-548 Make async_cmd more general Closed
is depended on by SERVER-2141 Official Node.js Driver Closed
Duplicate
is duplicated by CDRIVER-1989 Asynchronous API Closed
Related
Sub-Tasks:
Key
Summary
Type
Status
Assignee
CDRIVER-52 I write a Async find query mothed to ... Sub-task Closed  

 Comments   
Comment by A. Jesse Jiryu Davis [ 06/Aug/16 ]

I investigated for a customer whether an async C Driver is needed to support high-throughput applications talking to MongoDB over a high-latency high-throughput network. I ran some scaling tests and found that the C driver scales well to 4000 threads on a commonly deployed server VM. My test rig is a pair of EC2 m4.xlarge Ubuntu 14.04 machines in the US East zone. Using the latest C driver code on master, I wrote two programs, thread-loadtest-client.c and thread-loadtest-server.c. On both the client and server machines I configured a smattering of scalability options:

# Run on both client and server
sudo su
sysctl net.core.netdev_max_backlog=10000
sysctl net.ipv4.tcp_max_syn_backlog=10000
sysctl net.ipv4.ip_local_port_range="15000 61000"
sysctl net.ipv4.tcp_fin_timeout=2
sysctl net.ipv4.tcp_tw_recycle=1
sysctl net.ipv4.tcp_tw_reuse=1 
ifconfig eth0 txqueuelen 10000
# "ubuntu" is the username running the test programs
echo 'ubuntu soft nofile 40000' >> /etc/security/limits.conf
echo 'ubuntu hard nofile 40000' >> /etc/security/limits.conf
exit

I compiled the programs and started the server program on one machine:

./thread-loadtest-server

It chooses a port to listen on and accepts connections, spawning a thread for each. On each connection it receives "ismaster" requests from the test client program. For each request it waits 100ms, and responds.

On the client machine I ran:

./thread-loadtest-client 'mongodb://XXX:YYY/?maxPoolSize=9999&serverSelectionTimeoutMS=60000' ZZZ

... where "XXX" is the internal IP of the server machine, YYY is the listening port, and ZZZ is the number of threads for the test run. The client program starts the threads, waits until they've all connected to the server program, then runs them concurrently for about 10 seconds to measure throughput in commands-per-second. The results:

threads commands in 10 seconds cmds per thread per sec
1 99 9.9
501 49599 9.9
1001 99099 9.9
1501 148466 9.891139241
2001 197320 9.861069465
2501 244725 9.785085966
3001 294499 9.813362213
3501 342962 9.796115396
4001 371542 9.286228443
4501 377254 8.381559653
5001 379613 7.590741852
5501 378565 6.881748773
6001 381211 6.352457924

Throughput scales linearly through 3500 threads. Around 4000 threads it hits a throughput of 371k commands per 10 seconds. Adding any more than 4000 threads barely increases throughput. CPU wasn't saturated on the client or server, so I think 4000 threads overcame latency and saturated the network, which was my goal.

The program scales decently because each thread spends most of its time waiting for the high-latency response, so there's little unnecessary context-switching. (Starting and joining thousands of threads is time-consuming, but I don't include that overhead in my measurement.)

I conclude that an async rewrite of the C Driver is not necessary to handle a high-latency connection to MongoDB; it's probably not even helpful.

Comment by Alex Reviyou [ 12/Apr/14 ]

Hello, it look slike this issue existed for a while...
we are trying to user grifs-nginx module which depends on this issue to be resolved. (https://github.com/mdirolf/nginx-gridfs/issues/30)

Can you please let us know if this issue will even be resolved? We would like to see if we can use gridfg+nginx for our production system within several months...

THanks,
Alex

Generated at Wed Feb 07 21:08:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.