[SERVER-6391] Yielding with one (or more) active writer and heavy read load results in severe performance degradation Created: 11/Jul/12  Updated: 11/Jul/16  Resolved: 25/Jul/12

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: 2.0.2
Fix Version/s: 2.0.7, 2.2.0-rc1

Type: Bug Priority: Major - P3
Reporter: Ben Becker Assignee: Ben Becker
Resolution: Done Votes: 0
Labels: concurrency
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 12.04, Linux kernel v3.2.0-25
Release build of r2.0.2 tag
Virtualized system with:
4GB of memory (host has 16GB)
8 cores (same as host)
64GB disk via Parallels on SSD


Attachments: File SERVER-6391.js    
Issue Links:
Depends
Related
Operating System: Linux
Participants:

 Description   

Tests with the following command result in execution times of 20 seconds to 10 minutes:

> benchRun({ops:[{ns:'test.bench', op:'find', query:{a:1}}, {ns:'test.bench', op:'update', query:{a:1}, update:{$set:{z:1}}}], db:'test', parallel:4096});

The collection has 1000 identical documents:

{
	"_id" : ObjectId("4ffbd85b3e968c0d937006f4"),
	"a" : 1,
	"asdfasdf" : 1,
	"dsfsfda" : "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
	"z" : 1
}

With ~3944 concurrent readers and 10-15 concurrent writers:

  • If recommendYieldMicros() lies and says there are 3944 readers and 0 writers, test completes consistently in 23 seconds.
  • If recommendYieldMicros() lies and says there are 1 readers and 0 writers, test completes consistently in 20 seconds.
  • If recommendYieldMicros() lies and says there are 1 readers and 1 writers, test completes in 8 minutes and 14 seconds, stall/burst pattern*.
  • If recommendYieldMicros() tells the truth, test completes in 10 minutes and 54 seconds, stall/burst pattern*.

*NOTE: stall/burst in this sense means minute-long intervals where only ~300K-500K of data is transmitted, followed by a burst of 1-5Mbps for up to a few seconds. 'Consistently' in this case meaning between 10-50Mbps (avg. ~35Mbps). Daemon and client are on the same VM.

Having zero writers means yieldSuggest() always suggests not yielding. Under heavy load, this seems to indicate that the cost of actually yielding is tremendously expensive.

It should be noted that recommendedYieldMicros() is expensive when lots of connections are running, taking an average of 1/3 of each threads execution time with 4k connections. That said, the impact of removing this call is relatively minimal in terms of test completion time:

With the calls to recommendedYieldMicros() acquiring the clientsMutex lock and hard-coding writers to 0, the aforementioned test completes in 24 seconds, vs. 20 seconds when the lock/count is avoided.
With the calls to recommendedYieldMicros() acquiring the clientsMutex lock and hard-coding writers to 1, the aforementioned test completes in 11 minutes 10 seconds, vs. 8 minutes 14 seconds when the lock/count is avoided.



 Comments   
Comment by auto [ 27/Jul/12 ]

Author:

{u'date': u'2012-07-15T19:22:03-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6391 - don't sleep if we're a reader,
and call ElapsedTracker::resetLastTime when we're done sleeping

Conflicts:

db/clientcursor.cpp
Branch: v2.0
https://github.com/mongodb/mongo/commit/de0cfcd8de67c52b8206a27b7a5a939e70520051

Comment by auto [ 27/Jul/12 ]

Author:

{u'date': u'2012-07-15T19:22:49-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6391 - lower sleep times when yielding
Branch: v2.0
https://github.com/mongodb/mongo/commit/8e42151eccd3766bd0ad0320340d930ad0cc54ed

Comment by auto [ 27/Jul/12 ]

Author:

{u'date': u'2012-07-15T19:21:50-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6391 add resetLastTime to ElapsedTracker
Branch: v2.0
https://github.com/mongodb/mongo/commit/6701b6ea856ec2b34fe85429f5e920c86cb8391d

Comment by auto [ 25/Jul/12 ]

Author:

{u'date': u'2012-07-24T23:27:25-07:00', u'email': u'ben.becker@10gen.com', u'name': u'Ben Becker'}

Message: SERVER-6391: limit Client iterations when calculating yield time
Branch: master
https://github.com/mongodb/mongo/commit/73b97070259a0e30d5ed35880b11b32078620e42

Comment by Ben Becker [ 16/Jul/12 ]

yes, it just means that recommendedYieldMicros() only acquires clientsMutex and counts readers/writers once before every ~(numConnections/128) * 50 yields, instead of once before every yield.

Essentially a hacky cache similar to capping the number of iterations, but (possibly) further reducing contention on clientsMutex (note these tests don't compare the 'capped iterations' implementation, but happy to run that test).

Comment by Andy Schwerin [ 16/Jul/12 ]

Can you remind me what "cacheSuggest" is?

Comment by auto [ 16/Jul/12 ]

Author:

{u'date': u'2012-07-15T19:22:49-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6391 - lower sleep times when yielding
Branch: master
https://github.com/mongodb/mongo/commit/3b8ad0c803ea0e2d3bf577e68362ddb829cff5d1

Comment by auto [ 16/Jul/12 ]

Author:

{u'date': u'2012-07-15T19:22:03-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6391 - don't sleep if we're a reader,
and call ElapsedTracker::resetLastTime when we're done sleeping
Branch: master
https://github.com/mongodb/mongo/commit/d6a389787548657663282910e02edd59818ea162

Comment by auto [ 16/Jul/12 ]

Author:

{u'date': u'2012-07-15T19:21:50-07:00', u'email': u'eliot@10gen.com', u'name': u'Eliot Horowitz'}

Message: SERVER-6391 add resetLastTime to ElapsedTracker
Branch: master
https://github.com/mongodb/mongo/commit/47d00c2157f1186527f6915b20c0fa40c208a36d

Comment by Ben Becker [ 11/Jul/12 ]

Attaching simple test script.

Requires the following modification to bench.cpp (based on r2.0.2):

index 9ada7d6..fc89684 100644
--- a/scripting/bench.cpp
+++ b/scripting/bench.cpp
@@ -138,6 +138,11 @@ namespace mongo {
                 if ( op == "findOne" ) {
                     conn->findOne( ns , fixQuery( e["query"].Obj() ) );
                 }
+                else if ( op == "find" ) {
+                    auto_ptr<DBClientCursor> cursor = conn->query( ns , fixQuery( e["query"].Obj() ) );
+                    while (cursor->more())
+                        cursor->next();
+                }
                 else if ( op == "remove" ) {
                     conn->remove( ns , fixQuery( e["query"].Obj() ) );
                 }

Additional profiling changes are available at https://github.com/vrtx/mongo/commit/5b2c752477d113132c28dab129e54afd465af1f9.

Generated at Thu Feb 08 03:11:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.