[JAVA-137] SimplePool._get() "busy" wait suffers under heavy load. Created: 23/Jul/10  Updated: 24/Jul/10  Resolved: 24/Jul/10

Status: Closed
Project: Java Driver
Component/s: None
Affects Version/s: 1.4
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Steve Reed Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongo java driver 1.4
mongo server 1.4.2

default settings for max connections, etc.



 Description   

After investigating an issue where a service was using 100% of available CPU yet not accomplishing much, I saw that most threads had the following at the top of the java stack trace:

at java.lang.Thread.sleep(Thread.java)
at com.mongodb.util.ThreadUtil.sleep(ThreadUtil.java:37)
at com.mongodb.util.SimplePool._get(SimplePool.java:162)
at com.mongodb.util.SimplePool.get(SimplePool.java:106)
at com.mongodb.util.SimplePool.get(SimplePool.java:95)
at com.mongodb.ByteEncoder.get(ByteEncoder.java:66)
at com.mongodb.DBMessage.<init>(DBMessage.java:52)
at com.mongodb.DBApiLayer$MyCollection.find(DBApiLayer.java:282)
at com.mongodb.DBCursor._check(DBCursor.java:253)
at com.mongodb.DBCursor._hasNext(DBCursor.java:374)
at com.mongodb.DBCursor.hasNext(DBCursor.java:399)

Our software was running with a few (maybe ~2x) more threads than connections, and was doing a fair mix of read and write in mongo. Our reads are typically cursoring over many objects. When we write to mongo we always call resetError() and getLastError() to verify the write.

I read through the 1.4 java driver code and saw that for every DBMessage created (in our case, many thousands per second), the thread enters what I consider a busy wait (despite a 15 millisecond Thread.sleep()) while it waits for a ByteEncoder. This ended up accounting for a significant percentage of our overall CPU load.

Have you considered using wait() and notify()? Or different ByteEncoder creation/allocation strategies (such as ThreadLocal)?



 Comments   
Comment by Eliot Horowitz (Inactive) [ 24/Jul/10 ]

you should try 2.0

Generated at Thu Feb 08 08:51:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.