[JAVA-1825] 100% CPU with find request during a primary change Created: 20/May/15  Updated: 11/Sep/19  Resolved: 23/Jun/15

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: 3.0.1
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Julien Buret Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test.csv    

 Description   

On server side I have a replica set with one primary and one secondary.

I trigger a step-down (ie I switch the primary and the secondary) and during this operation, the client send a find request using the async java driver.

-> 100% CPU for ever on client side

When profiling, all the CPU is wasted in com.mongodb.connection.AsynchronousSocketChannelStream$BasicCompletionHandle#completed with this kind of stacktrace :

Thread-26@11189 daemon, prio=5, in group 'main', status: 'RUNNING'
at com.mongodb.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:219)
at com.mongodb.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:204)
at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
at sun.nio.ch.Invoker$2.run(Invoker.java:218)
at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

When no request is sent during the step-down, all is ok



 Comments   
Comment by Julien Buret [ 23/May/15 ]

top says 100% CPU, and so does the visualvm monitor, as soon as I stepDown. If i kill the primary, I've got this :

null-com.mongodb.MongoSocketOpenException: Exception opening socket

... and 100% CPU

Anyway if you don't reproduce, you don't reproduce. This issue is not blocking for me, and I will just reboot the instances when compose.io stepdown (which happens rarely)

Comment by Jeffrey Yemin [ 22/May/15 ]

What I mean is that I started the program when there is no primary available to execute the query on. An rs.stepDown() is one way to do that, but just killing the primary is another.

From what I can see from the small log sample above, the driver is doing the right thing here. It discovers the new primary on port 27018, and then discovers that the the server on port 27017 is now a secondary.

As for the CPU usage, I actually don't see evidence of high CPU usage in the CSV file that you provided. The way I understand the data in the CSV file, the Time in column B is a percentage of total time, including wait time. The actual CPU usage is in column D, and is much lower. You can see that very clearly for the main thread starting in row 3: the time is quite high, but it's all ultimately attributed to Thread.sleep() and actual CPU usage is very low. And it's the same for AsynchronousSocketChannelStream$BasicCompletionHandler.completed: the total time is relatively high, as is the percentage, but CPU usage is very low, and this is because ultimately the method is waiting on I/O to complete and is not doing much real work on the CPU.

If you'd like to verify the actual CPU usage of the Java driver, please try a command line tool like top.

Comment by Julien Buret [ 22/May/15 ]

What do you mean by "run with no primary". With a rs.stepDown() I got :

2015-05-22T09:52:25.105 [cluster-ClusterId{value='555edfe84ce9b1174296bfbe', description='null'}-localhost:27018] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27018, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[2, 6, 5]}, minWireVersion=0, maxWireVersion=2, maxDocumentSize=16777216, roundTripTimeNanos=1945332, setName='rs0', hosts=[localhost:27018, localhost:27017], passives=[], arbiters=[], primary='localhost:27018', tagSet=TagSet{[]}}
2015-05-22T09:52:25.105 [cluster-ClusterId{value='555edfe84ce9b1174296bfbe', description='null'}-localhost:27018] INFO  org.mongodb.driver.cluster - Discovered replica set primary localhost:27018
2015-05-22T09:52:25.106 [cluster-ClusterId{value='555edfe84ce9b1174296bfbe', description='null'}-localhost:27018] INFO  org.mongodb.driver.cluster - Rediscovering type of existing primary localhost:27017
2015-05-22T09:52:25.138 [cluster-ClusterId{value='555edfe84ce9b1174296bfbe', description='null'}-localhost:27017] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:7, serverValue:41}] to localhost:27017
2015-05-22T09:52:25.139 [cluster-ClusterId{value='555edfe84ce9b1174296bfbe', description='null'}-localhost:27017] INFO  org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[2, 6, 5]}, minWireVersion=0, maxWireVersion=2, maxDocumentSize=16777216, roundTripTimeNanos=753414, setName='rs0', hosts=[localhost:27018, localhost:27017], passives=[], arbiters=[], primary='localhost:27018', tagSet=TagSet{[]}}
2015-05-22T09:52:29.964 [Thread-13] INFO  org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:33}] to localhost:27018

but no timeout exception

Comment by Jeffrey Yemin [ 21/May/15 ]

Can you try adding a shorter server selection timeout to ClusterSettings builder:

.clusterSettings(ClusterSettings.builder()
                                                  .serverSelectionTimeout(1, TimeUnit.SECONDS)
                                                  .applyConnectionString(connectionString)
                                                  .build())

When I do that to your program, and run with no primary, it prints this out 1 second after each async call to find:

null-com.mongodb.MongoTimeoutException: Timed out after 1000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primaryPreferred}. Client view of cluster state is {type=UNKNOWN, servers=[{address=localhost:27017, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoInternalException: Opening the AsynchronousSocketChannelStream failed}, caused by {com.mongodb.MongoSocketOpenException: Exception opening socket}, caused by {java.net.ConnectException: Connection refused}}]

I'd like to confirm that we're at least seeing roughly the same behavior before looking at your profiling results.

Comment by Julien Buret [ 21/May/15 ]

Yes it seems the request never times out.

  • The jdk is jdk1.8.0_40 on ubuntu or linux aws
  • I use visualvm monitor & sampler
  • The mongo server is hosted on compose.io

The change of minHeartbeatFrequency (I tried 1, 3, 5, 10 and 30s) has no effect, sometimes I got this exception (but the cpu issue remains) :

Exception in thread "Thread-16" java.lang.NullPointerException
	at com.mongodb.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:219)
	at com.mongodb.connection.AsynchronousSocketChannelStream$BasicCompletionHandler.completed(AsynchronousSocketChannelStream.java:204)
	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

I build a simple test case to reproduce, the result csv export of visualvm is attached :

public static void main(String[] args) throws Exception {
        ConnectionString connectionString = new ConnectionString("connection");
 
        MongoClientSettings settings = MongoClientSettings.builder()
                .credentialList(connectionString.getCredentialList())
                .readPreference(ReadPreference.primaryPreferred())
                .clusterSettings(ClusterSettings.builder()
                        .applyConnectionString(connectionString)
                        .build())
                .serverSettings(ServerSettings.builder()
                        .minHeartbeatFrequency(5, TimeUnit.SECONDS)
                        .build())
                .build();
        MongoClient client = MongoClients.create(settings);
        MongoDatabase db = client.getDatabase("db");
 
        while (true) {
            Thread.sleep(5000L);
            db.getCollection("log").find().first((r, t) -> System.out.println(r + "-" + t));
        }
    }

Comment by Jeffrey Yemin [ 21/May/15 ]

Are you saying that the find request never times out? In my testing I generally see the following exception delivered to the callback after 30 seconds, if the primary doesn't come back within that time:

com.mongodb.MongoTimeoutException: Timed out after 30000 ms while waiting for a server that matches ReadPreferenceServerSelector{readPreference=primary}....

I also don't see anything close to 100% CPU usage when I step down the primary. What platform is your Java program executing on? What version of the JDK? And how are you measuring the CPU usage?

To help get a better understanding of what's going on in your environment, please try setting higher values for ServerSettings.minHeartbeatFrequency, which defaults to 500 milliseconds, and report what CPU usage you're seeing for different values of that property For example, this will set it to 5 seconds:

        ConnectionString connectionString = new ConnectionString("mongodb://localhost,localhost:27018");
        MongoClientSettings settings = MongoClientSettings.builder()
                                                          .clusterSettings(ClusterSettings.builder()
                                                                                          .applyConnectionString(connectionString)
                                                                                          .build())
                                                          .serverSettings(ServerSettings.builder()
                                                                                        .minHeartbeatFrequency(5, TimeUnit.SECONDS)
                                                                                        .build())
                                                          .build();

Generated at Thu Feb 08 08:55:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.