[JAVA-207] performance impact with multiple concurrent queries Created: 03/Nov/10  Updated: 17/Mar/11  Resolved: 08/Nov/10

Status: Closed
Project: Java Driver
Component/s: Performance
Affects Version/s: 2.3
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Georg Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 10.04 x64, mongodb-unstable 20101028, mongo-java-driver 2.3, quadcore, 3.2G RAM


Attachments: Java Source File PerfTest.java     Java Source File PerfTestRunner.java    
Issue Links:
Depends
depends on JAVA-221 Try to remove synchronized in DBApiLa... Closed

 Description   

when running my testcase (attachment)
i get these values back:

========================================================================================
Testname: 1 Thread
	# of Threads:   1
	# of Loops:     400
	# of values:     400
	Average Returned Values:  100
	Average Time:   53ms
========================================================================================
========================================================================================
Testname: 5 Thread
	# of Threads:   5
	# of Loops:     800
	# of values:     4000
	Average Returned Values:  100
	Average Time:   196ms
========================================================================================
========================================================================================
Testname: 10 Thread
	# of Threads:   10
	# of Loops:     40
	# of values:     400
	Average Returned Values:  100
	Average Time:   458ms
========================================================================================
========================================================================================
Testname: 20 Thread
	# of Threads:   20
	# of Loops:     20
	# of values:     400
	Average Returned Values:  100
	Average Time:   1020ms
========================================================================================
 

so 20 threads use approx 988ms per query,
one thread uses approx 53ms, so 20 concurrent requests seem to increase the query time by factor 20 ...
is this to be expected? is this a known issue? can we tweak it somewhere?



 Comments   
Comment by Antoine Girbal [ 30/Nov/10 ]

ok here are final figures after implementing JAVA-221

Before patch
========================================================================================
Testname: 1 Thread

  1. of Threads: 4
  2. of Loops: 2000
  3. of values: 8000
    Average Returned Values: 99
    Avg Time:2ms
    Total Time: 4343ms
    ========================================================================================
    ========================================================================================
    Testname: 1 Thread
  4. of Threads: 1
  5. of Loops: 8000
  6. of values: 8000
    Average Returned Values: 100
    Avg Time:1ms
    Total Time: 10021ms
    ========================================================================================
    ========================================================================================
    Testname: 1 Thread
  7. of Threads: 8
  8. of Loops: 1000
  9. of values: 8000
    Average Returned Values: 99
    Avg Time:4ms
    Total Time: 4217ms
    ========================================================================================

After patch
========================================================================================
Testname: 1 Thread

  1. of Threads: 4
  2. of Loops: 2000
  3. of values: 8000
    Average Returned Values: 99
    Avg Time:1ms
    Total Time: 3432ms
    ========================================================================================
    ========================================================================================
    Testname: 1 Thread
  4. of Threads: 1
  5. of Loops: 8000
  6. of values: 8000
    Average Returned Values: 99
    Avg Time:1ms
    Total Time: 9325ms
    ========================================================================================
    ========================================================================================
    Testname: 1 Thread
  7. of Threads: 8
  8. of Loops: 1000
  9. of values: 8000
    Average Returned Values: 99
    Avg Time:3ms
    Total Time: 3747ms
    ========================================================================================

So basically about 10-20% gain in speed.

Comment by Georg [ 30/Nov/10 ]

yes, reduce the requested size changes the duration a lot, but the increase is still around factor 4 in duration of each query (with finalize).

Comment by uwe schaefer [ 30/Nov/10 ]

"- I am getting random errors like "BSONDecoder doesn't understand type : 95 name: id" on tests, need to look why.. "

had that problem with the snapshot as well. on a 2.3 release driver with finalize removed, this disappeared.

Comment by Antoine Girbal [ 30/Nov/10 ]

Note that, as Eliot mentioned, reducing size of objects does change outcome a lot.
With a driver that has no finalize, and removing the Y field from test, the 4-thread app is now over twice as fast as 1-thread app.

With finalize():
========================================================================================
Testname: 4 Thread

  1. of Threads: 4
  2. of Loops: 1000
  3. of values: 4000
    Average Returned Values: 98
    Avg Time:5ms
    Total Time: 6153ms
    ========================================================================================
    ========================================================================================
    Testname: 1 Thread
  4. of Threads: 1
  5. of Loops: 4000
  6. of values: 4000
    Average Returned Values: 100
    Avg Time:1ms
    Total Time: 6793ms
    ========================================================================================

without finalize():
========================================================================================
Testname: 4 Thread

  1. of Threads: 4
  2. of Loops: 1000
  3. of values: 4000
    Average Returned Values: 99
    Avg Time:2ms
    Total Time: 2214ms
    ========================================================================================
    ========================================================================================
    Testname: 1 Thread
  4. of Threads: 1
  5. of Loops: 4000
  6. of values: 4000
    Average Returned Values: 100
    Avg Time:1ms
    Total Time: 4515ms
    ========================================================================================
Comment by Antoine Girbal [ 30/Nov/10 ]

looking into this issue, a few comments:

  • first I ran the test with current driver, times were 30ms for 1 thread and 100ms for 4 threads.
  • the first 4 times of the results when using a 4 thread threadpool are really high (most likely due to thread creation).
    Average time should skip first 4 or use more items, to be more fair.
  • after removing sync on finalization, times drop to about 8ms for 1 and 30ms for 4 threads.
    With 4 threads CPU usage is about 50% for each thread, and 10-20% for mongod.
    Comparatively if running with 1 thread, it uses about 90% cpu.
    So still there seem to be a bottleneck to fully use all cores, could be syncing or other.
  • overall time when using 4 threads is a bit shorter than when using one.
    1 thread ~27s
    4 threads ~ 33s
  • I am getting random errors like "BSONDecoder doesn't understand type : 95 name: id" on tests, need to look why..

========================================================================================
Testname: 4 Thread

  1. of Threads: 4
  2. of Loops: 1000
  3. of values: 4000
    Average Returned Values: 96
    Avg Time:27ms
    Total Time: 27575ms
    ========================================================================================

========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 4000
  3. of values: 4000
    Average Returned Values: 100
    Avg Time:9ms
    Total Time: 33067ms
    ========================================================================================
Comment by uwe schaefer [ 27/Nov/10 ]

I had a closer look at the code provided. in fact, this seems like a finalization/GC issue. if you use the 2.3 release driver and remove the finalize() methods from DBApiLayer, the cores avail are used to 100% and the numbers created are significantly better.

i think this is due to implicit sync on finalization. Could we somehow get away without finalization there?

Comment by Eliot Horowitz (Inactive) [ 09/Nov/10 ]

the tcp stack is the limiting factor in the big case
so more jvms gets you more kernel access

Comment by Georg [ 09/Nov/10 ]

then i still don't get the differences in average query time with 4 JVMs (each 1 query thread) and 1 JVM and 4 threads all on the same machine,
so can you proabpbly tell me where the difference come from?

i also tried to increase the memory size for the test case, but this didn't really help

Comment by Eliot Horowitz (Inactive) [ 09/Nov/10 ]

Try running in a profiler.
You'll see very quickly there is no locking issue.

You also need a db.getLastError() at the end of setUp()

Comment by Georg [ 09/Nov/10 ]

i'm sorry, but i disagree, for me it looks like, it's mostly monothreaded,
and my requests appear to be blocked somewhere, perhaps due to looking, which did not happen, when running, in different JVM, but of course, i'll try this, here we go with the times.
well, as the number of data (decreased significantly, which is transfered over local loopback (lo0), though, there almost no speedlimits ...

but the factors are almost the same, so 4thread is 175% plus on average query time, to 1 thread, ond twenty thread is almost 9 times the query time.

so i will try as mentioned here, to provide a patch http://groups.google.com/group/mongodb-user/browse_thread/thread/759d96e00d578ef1/cacb245168e511e9?lnk=gst&q=java+driver+concurrency#cacb245168e511e9

========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 400
  3. of values: 400
    Average Returned Values: 100
    Average Time: 4ms
    total Runtime: 1871ms
    ========================================================================================
    ========================================================================================
    Testname: 4 Thread
  4. of Threads: 4
  5. of Loops: 100
  6. of values: 400
    Average Returned Values: 98
    Average Time: 7ms
    total Runtime: 797ms
    ========================================================================================
    ========================================================================================
    Testname: 10 Thread
  7. of Threads: 10
  8. of Loops: 40
  9. of values: 400
    Average Returned Values: 91
    Average Time: 20ms
    total Runtime: 815ms
    ========================================================================================
    ========================================================================================
    Testname: 20 Thread
  10. of Threads: 20
  11. of Loops: 20
  12. of values: 400
    Average Returned Values: 100
    Average Time: 34ms
    total Runtime: 759ms
    ========================================================================================
Comment by Eliot Horowitz (Inactive) [ 08/Nov/10 ]

You're network bound.

Try commenting out
object.append("y", getYHierachy());

Comment by Eliot Horowitz (Inactive) [ 08/Nov/10 ]

Will look at updated code.

Please only use either jira or user list per issue, not both.

Comment by Georg [ 08/Nov/10 ]

question on mailinglist: http://groups.google.com/group/mongodb-user/browse_thread/thread/33227d73c52b1a0f

Comment by Georg [ 08/Nov/10 ]

so, to make sure, the time increase is due to concurrency issues with this driver, i run the tests in parallel, with one worker thread, in 4 different jvms:

4jvm

thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 4000
  3. of values: 4000
    Average Returned Values: 100
    Average Time: 74ms
    total Runtime: 296626ms
    ========================================================================================
    thales.:~/Desktop$

thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 4000
  3. of values: 4000
    Average Returned Values: 100
    Average Time: 73ms
    total Runtime: 293402ms
    ========================================================================================
    thales.:~/Desktop$

thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 4000
  3. of values: 4000
    Average Returned Values: 100
    Average Time: 73ms
    total Runtime: 294837ms
    ========================================================================================
    thales.:~/Desktop$

thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 4000
  3. of values: 4000
    Average Returned Values: 100
    Average Time: 73ms
    total Runtime: 295533ms
    ========================================================================================
    thales.:~/Desktop$

so here is also a increase of 20ms, (around 40-50% to 51ms with only one jvm), but this is way better, then the 150ms with 4threads. (and can be due to other limitations)

Comment by Georg [ 04/Nov/10 ]

after adding some total time measuring statements my test outputs this:

========================================================================================
Testname: 1 Thread

  1. of Threads: 1
  2. of Loops: 400
  3. of values: 400
    Average Returned Values: 100
    Average Time: 52ms
    total Runtime: 21063ms
    ========================================================================================
    ========================================================================================
    Testname: 4 Thread
  4. of Threads: 4
  5. of Loops: 100
  6. of values: 400
    Average Returned Values: 100
    Average Time: 168ms
    total Runtime: 16871ms
    ========================================================================================
    ========================================================================================
    Testname: 10 Thread
  7. of Threads: 10
  8. of Loops: 40
  9. of values: 400
    Average Returned Values: 100
    Average Time: 544ms
    total Runtime: 22081ms
    ========================================================================================
    ========================================================================================
    Testname: 20 Thread
  10. of Threads: 20
  11. of Loops: 20
  12. of values: 400
    Average Returned Values: 100
    Average Time: 1025ms
    total Runtime: 20754ms
    ========================================================================================

all these tests have the same number of queries overall (400). the overall time hardly decreases with the number of threads, even the other way around for 10 threads where it is 22s instead of 21s for one thread.

for each individual query the query time increases with the number of threads (from 52 to 544 ms with 10 threads).

monitoring the core usage didn't show more then 2 cores in use whatever the number of concurrent threads had been.

is it linked to the fact, that i run always the same query against the same collection?
is there a way to make fast (like settings on the daemon, hardware recommendations?) ?

Comment by Georg [ 04/Nov/10 ]

well ...

okay, point taken, so i did to changes to my test:

i created this test method:

    public void test4Thread() throws Exception
    {
        runWithNumberOfThreadsAndLoops("4 Thread", 4, 100);
    }

which in this case runs with 4 threads

i run these tests against a local mongodb and one remote one (also with 4 cores)
local session:

========================================================================================
Testname: 1 Thread
	# of Threads:   1
	# of Loops:     400
	# of values:     400
	Average Returned Values:  100
	Average Time:   51ms
========================================================================================
========================================================================================
Testname: 4 Thread
	# of Threads:   4
	# of Loops:     100
	# of values:     400
	Average Returned Values:  100
	Average Time:   161ms
========================================================================================

remote session:

========================================================================================
Testname: 1 Thread
	# of Threads:   1
	# of Loops:     400
	# of values:     400
	Average Returned Values:  100
	Average Time:   83ms
========================================================================================
========================================================================================
Testname: 4 Thread
	# of Threads:   4
	# of Loops:     100
	# of values:     400
	Average Returned Values:  100
	Average Time:   302ms
========================================================================================

while my cpu utilization is never going across 200% (== 2cores)

now, to make sure JUnit isn't fidling with it i created a main class, und start the tests from there:

========================================================================================
Testname: 1 Thread
	# of Threads:   1
	# of Loops:     400
	# of values:     400
	Average Returned Values:  100
	Average Time:   53ms
========================================================================================
========================================================================================
Testname: 4 Thread
	# of Threads:   4
	# of Loops:     100
	# of values:     400
	Average Returned Values:  100
	Average Time:   168ms
========================================================================================

Comment by Eliot Horowitz (Inactive) [ 04/Nov/10 ]

2 issues:

  • 4 cores with 20 threads
  • need to set the connection pool to 20
Generated at Thu Feb 08 08:51:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.