|
ok here are final figures after implementing JAVA-221
Before patch
========================================================================================
Testname: 1 Thread
- of Threads: 4
- of Loops: 2000
- of values: 8000
Average Returned Values: 99
Avg Time:2ms
Total Time: 4343ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 8000
- of values: 8000
Average Returned Values: 100
Avg Time:1ms
Total Time: 10021ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 8
- of Loops: 1000
- of values: 8000
Average Returned Values: 99
Avg Time:4ms
Total Time: 4217ms
========================================================================================
After patch
========================================================================================
Testname: 1 Thread
- of Threads: 4
- of Loops: 2000
- of values: 8000
Average Returned Values: 99
Avg Time:1ms
Total Time: 3432ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 8000
- of values: 8000
Average Returned Values: 99
Avg Time:1ms
Total Time: 9325ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 8
- of Loops: 1000
- of values: 8000
Average Returned Values: 99
Avg Time:3ms
Total Time: 3747ms
========================================================================================
So basically about 10-20% gain in speed.
|
|
yes, reduce the requested size changes the duration a lot, but the increase is still around factor 4 in duration of each query (with finalize).
|
|
"- I am getting random errors like "BSONDecoder doesn't understand type : 95 name: id" on tests, need to look why.. "
had that problem with the snapshot as well. on a 2.3 release driver with finalize removed, this disappeared.
|
|
Note that, as Eliot mentioned, reducing size of objects does change outcome a lot.
With a driver that has no finalize, and removing the Y field from test, the 4-thread app is now over twice as fast as 1-thread app.
With finalize():
========================================================================================
Testname: 4 Thread
- of Threads: 4
- of Loops: 1000
- of values: 4000
Average Returned Values: 98
Avg Time:5ms
Total Time: 6153ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Avg Time:1ms
Total Time: 6793ms
========================================================================================
without finalize():
========================================================================================
Testname: 4 Thread
- of Threads: 4
- of Loops: 1000
- of values: 4000
Average Returned Values: 99
Avg Time:2ms
Total Time: 2214ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Avg Time:1ms
Total Time: 4515ms
========================================================================================
|
|
looking into this issue, a few comments:
- first I ran the test with current driver, times were 30ms for 1 thread and 100ms for 4 threads.
- the first 4 times of the results when using a 4 thread threadpool are really high (most likely due to thread creation).
Average time should skip first 4 or use more items, to be more fair.
- after removing sync on finalization, times drop to about 8ms for 1 and 30ms for 4 threads.
With 4 threads CPU usage is about 50% for each thread, and 10-20% for mongod.
Comparatively if running with 1 thread, it uses about 90% cpu.
So still there seem to be a bottleneck to fully use all cores, could be syncing or other.
- overall time when using 4 threads is a bit shorter than when using one.
1 thread ~27s
4 threads ~ 33s
- I am getting random errors like "BSONDecoder doesn't understand type : 95 name: id" on tests, need to look why..
========================================================================================
Testname: 4 Thread
- of Threads: 4
- of Loops: 1000
- of values: 4000
Average Returned Values: 96
Avg Time:27ms
Total Time: 27575ms
========================================================================================
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Avg Time:9ms
Total Time: 33067ms
========================================================================================
|
|
I had a closer look at the code provided. in fact, this seems like a finalization/GC issue. if you use the 2.3 release driver and remove the finalize() methods from DBApiLayer, the cores avail are used to 100% and the numbers created are significantly better.
i think this is due to implicit sync on finalization. Could we somehow get away without finalization there?
|
|
the tcp stack is the limiting factor in the big case
so more jvms gets you more kernel access
|
|
then i still don't get the differences in average query time with 4 JVMs (each 1 query thread) and 1 JVM and 4 threads all on the same machine,
so can you proabpbly tell me where the difference come from?
i also tried to increase the memory size for the test case, but this didn't really help
|
|
Try running in a profiler.
You'll see very quickly there is no locking issue.
You also need a db.getLastError() at the end of setUp()
|
|
i'm sorry, but i disagree, for me it looks like, it's mostly monothreaded,
and my requests appear to be blocked somewhere, perhaps due to looking, which did not happen, when running, in different JVM, but of course, i'll try this, here we go with the times.
well, as the number of data (decreased significantly, which is transfered over local loopback (lo0), though, there almost no speedlimits ...
but the factors are almost the same, so 4thread is 175% plus on average query time, to 1 thread, ond twenty thread is almost 9 times the query time.
so i will try as mentioned here, to provide a patch http://groups.google.com/group/mongodb-user/browse_thread/thread/759d96e00d578ef1/cacb245168e511e9?lnk=gst&q=java+driver+concurrency#cacb245168e511e9
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 400
- of values: 400
Average Returned Values: 100
Average Time: 4ms
total Runtime: 1871ms
========================================================================================
========================================================================================
Testname: 4 Thread
- of Threads: 4
- of Loops: 100
- of values: 400
Average Returned Values: 98
Average Time: 7ms
total Runtime: 797ms
========================================================================================
========================================================================================
Testname: 10 Thread
- of Threads: 10
- of Loops: 40
- of values: 400
Average Returned Values: 91
Average Time: 20ms
total Runtime: 815ms
========================================================================================
========================================================================================
Testname: 20 Thread
- of Threads: 20
- of Loops: 20
- of values: 400
Average Returned Values: 100
Average Time: 34ms
total Runtime: 759ms
========================================================================================
|
|
You're network bound.
Try commenting out
object.append("y", getYHierachy());
|
|
Will look at updated code.
Please only use either jira or user list per issue, not both.
|
|
question on mailinglist: http://groups.google.com/group/mongodb-user/browse_thread/thread/33227d73c52b1a0f
|
|
so, to make sure, the time increase is due to concurrency issues with this driver, i run the tests in parallel, with one worker thread, in 4 different jvms:
4jvm
thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Average Time: 74ms
total Runtime: 296626ms
========================================================================================
thales.:~/Desktop$
thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Average Time: 73ms
total Runtime: 293402ms
========================================================================================
thales.:~/Desktop$
thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Average Time: 73ms
total Runtime: 294837ms
========================================================================================
thales.:~/Desktop$
thales.:~/Desktop$ java -jar com.softculture.core.test-1.0-SNAPSHOT-agent.jar
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 4000
- of values: 4000
Average Returned Values: 100
Average Time: 73ms
total Runtime: 295533ms
========================================================================================
thales.:~/Desktop$
so here is also a increase of 20ms, (around 40-50% to 51ms with only one jvm), but this is way better, then the 150ms with 4threads. (and can be due to other limitations)
|
|
after adding some total time measuring statements my test outputs this:
========================================================================================
Testname: 1 Thread
- of Threads: 1
- of Loops: 400
- of values: 400
Average Returned Values: 100
Average Time: 52ms
total Runtime: 21063ms
========================================================================================
========================================================================================
Testname: 4 Thread
- of Threads: 4
- of Loops: 100
- of values: 400
Average Returned Values: 100
Average Time: 168ms
total Runtime: 16871ms
========================================================================================
========================================================================================
Testname: 10 Thread
- of Threads: 10
- of Loops: 40
- of values: 400
Average Returned Values: 100
Average Time: 544ms
total Runtime: 22081ms
========================================================================================
========================================================================================
Testname: 20 Thread
- of Threads: 20
- of Loops: 20
- of values: 400
Average Returned Values: 100
Average Time: 1025ms
total Runtime: 20754ms
========================================================================================
all these tests have the same number of queries overall (400). the overall time hardly decreases with the number of threads, even the other way around for 10 threads where it is 22s instead of 21s for one thread.
for each individual query the query time increases with the number of threads (from 52 to 544 ms with 10 threads).
monitoring the core usage didn't show more then 2 cores in use whatever the number of concurrent threads had been.
is it linked to the fact, that i run always the same query against the same collection?
is there a way to make fast (like settings on the daemon, hardware recommendations?) ?
|
|
well ...
okay, point taken, so i did to changes to my test:
i created this test method:
public void test4Thread() throws Exception
|
{
|
runWithNumberOfThreadsAndLoops("4 Thread", 4, 100);
|
}
|
which in this case runs with 4 threads
i run these tests against a local mongodb and one remote one (also with 4 cores)
local session:
========================================================================================
|
Testname: 1 Thread
|
# of Threads: 1
|
# of Loops: 400
|
# of values: 400
|
Average Returned Values: 100
|
Average Time: 51ms
|
========================================================================================
|
========================================================================================
|
Testname: 4 Thread
|
# of Threads: 4
|
# of Loops: 100
|
# of values: 400
|
Average Returned Values: 100
|
Average Time: 161ms
|
========================================================================================
|
remote session:
========================================================================================
|
Testname: 1 Thread
|
# of Threads: 1
|
# of Loops: 400
|
# of values: 400
|
Average Returned Values: 100
|
Average Time: 83ms
|
========================================================================================
|
========================================================================================
|
Testname: 4 Thread
|
# of Threads: 4
|
# of Loops: 100
|
# of values: 400
|
Average Returned Values: 100
|
Average Time: 302ms
|
========================================================================================
|
while my cpu utilization is never going across 200% (== 2cores)
now, to make sure JUnit isn't fidling with it i created a main class, und start the tests from there:
========================================================================================
|
Testname: 1 Thread
|
# of Threads: 1
|
# of Loops: 400
|
# of values: 400
|
Average Returned Values: 100
|
Average Time: 53ms
|
========================================================================================
|
========================================================================================
|
Testname: 4 Thread
|
# of Threads: 4
|
# of Loops: 100
|
# of values: 400
|
Average Returned Values: 100
|
Average Time: 168ms
|
========================================================================================
|
|
|
2 issues:
- 4 cores with 20 threads
- need to set the connection pool to 20
|
Generated at Thu Feb 08 08:51:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.