[JAVA-307] The low speed of range query with two constraints Created: 25/Mar/11  Updated: 11/Sep/19  Resolved: 04/Sep/12

Status: Closed
Project: Java Driver
Component/s: API
Affects Version/s: 2.5
Fix Version/s: None

Type: Task Priority: Critical - P2
Reporter: shawn yang Assignee: Unassigned
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

network card: 1000Mbit/s, memory 8G, Mongodb 1.8, Mongodb driver 2.5



 Description   

I use single java client to do range query with two constraints and i have create index on query key, the read speed only can reach to 6M/s (Read back 6M, nearly spend 1.1 second ), but my network card is "1000Mbit/s" and all of query data have been in memory(The data size < Memory size)?

stats info:

"objects" : 2936299,
"avgObjSize" : 357.40783210429186,
"dataSize" : 1049456260,
"storageSize" : 1181462784,
"indexSize" : 370705344,
"fileSize" : 4226809856,

select report_path from table where visit_time > start_visit_time and visit_time < end_visit_time
--------------------------------------------------------------------
BasicDBObject range = new BasicDBObject();
range.put("$gt", start_visit_time);
range.put("$lt", end_visit_time);

BasicDBObject query = new BasicDBObject();
query.put("visit_time", range);

BasicDBObject keys = new BasicDBObject();
keys.put("report_path", 1);
keys.put("_id", 0);

DBCursor cur = ct.find(query, keys);
while (cur.hasNext())

{ String report_path = cur.next().toString(); }

 Comments   
Comment by Jeffrey Yemin [ 04/Sep/12 ]

Apologies for letting this sit so long without a response. Please re-open if you want to pursue it further.

Comment by shawn yang [ 25/Mar/11 ]

Update the test cases, add the detail consume time in each part (return DBCursor VS read back data ):

BasicDBObject range = new BasicDBObject();
range.put("$gt", start_visit_time);
range.put("$lt", end_visit_time);

BasicDBObject query = new BasicDBObject();
query.put("visit_time", range);

BasicDBObject keys = new BasicDBObject();
keys.put("report_path", 1);
keys.put("_id", 0);

long st1 = System.currentTimeMillis();

DBCursor cur = ct.find(query, keys).batchSize(1000);
cur.count();
long ed1 = System.currentTimeMillis();
System.out.println(ed1 - st1); // consume: 319 ms, why the count operation need so long time?

long st = System.currentTimeMillis();

while (cur.hasNext())

{ cur.next(); }

long ed = System.currentTimeMillis();
System.out.println(ed1 - st1); // consume: 958 ms

Comment by shawn yang [ 25/Mar/11 ]

1) Read back : the record number : 217611, the total data size is 6734899 bytes
2) remove toString, then it will consume 964ms (a little small than 1.1second)
3) I have been used batch size 2048 (I use batch size 1000 to test, nearly the same value)
4) I want to know the size of read back data, so add the following code in while statement:
while (cur.hasNext())

{ String report_path = cur.next().toString(); total += report_path.length(); // i have been removed this statement }

---------------------------------------------------

When i remove toString, it will consume 964ms, a little smaller.

Comment by Antoine Girbal [ 25/Mar/11 ]

how many records are you reading back?
I'm guessing 6MB / 357 = 16800
This means the driver will do many getMore operations on the cursor, probably 168 of them.
The original operation that takes 294 ms only gives you a cursor back, but does not show the whole retrieval time.
A few things to try and see if faster:

  • remove the toString() in java
  • increase the batch size in query (say set it to 1000)

What do you mean by:
"If i don't calculate the total size of read back data, the cpu percent only is 9.7%, so i think if it is the java driver's problem?"

Comment by shawn yang [ 25/Mar/11 ]

I open mongodb profiling and get the following info:

> db.system.profile.find()
{ "ts" : ISODate("2011-03-25T07:07:28.147Z"), "info" : "query ycsb.$cmd ntoreturn:1 command: { count: \"taobao\", query: { visit_time:

{ $gt: \"2011-03-07 16:53:34\", $lt: \"2011-03-09 16:53:34\" }

}, fields:

{ visit_time: 1, _id: 0 }

} reslen:64 294ms", "millis" : 294 }

It seems the server process time only is 294ms, using mongostat to check %lock, its value is 0; The client and server are on the same server using Local look back.

From test case:
DBCursor cur = ct.find(query, keys); // this place spend nearly 322ms
// the while loop to get back data spend more time
while (cur.hasNext())

{ String report_path = cur.next().toString(); }

----------------------
Note: millis Time, in milliseconds, to perform the operation. This time does not include time to acquire the lock or network time, just the time for the server to process.

Comment by shawn yang [ 25/Mar/11 ]

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29622 shawn 15 0 2163m 116m 115m R 9.7 1.5 0:04.28 mongod
30418 shawn 17 0 1349m 22m 8080 S 6.7 0.3 0:00.20 java
-----------------------------------------

If i don't calculate the total size of read back data, the cpu percent only is 9.7%, so i think if it is the java driver's problem?

Comment by shawn yang [ 25/Mar/11 ]

1) sure, i have been used iostat to check it, it's idle

2) the percent of cpu only reach to ~22%

So the bottleneck isn't disk io/ cpu/memory.

Comment by Eliot Horowitz (Inactive) [ 25/Mar/11 ]

Can you verify disk is idle during this yes?
Is java maxing out a core?

Generated at Thu Feb 08 08:51:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.