[SERVER-11586] benchRun updates are far slower than expected Created: 05/Nov/13  Updated: 11/Jul/16  Resolved: 29/Jan/14

Status: Closed
Project: Core Server
Component/s: Performance
Affects Version/s: None
Fix Version/s: 2.6.0

Type: Bug Priority: Minor - P4
Reporter: Andrew Emil (Inactive) Assignee: Davide Italiano
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-15149 benchRun write concern, write command... Closed
Operating System: ALL
Participants:

 Description   

Running a 4 second benchRun with a single update operation can take much more than 4 seconds. This does not appear to be the case for inserts or queries. It also seems to take longer if you keep doing benchRun updates in a row.

> db.test.drop()
true
> testTime
function (benchDoc) {
print(new Date())
res = benchRun(benchDoc)
print(new Date())
printjson(res)
 
}
> insertDoc
{
	"parallel" : 2,
	"seconds" : 4,
	"ops" : [
		{
			"ns" : "test.test",
			"op" : "insert",
			"doc" : {
				"a" : {
					"#RAND_INT" : [
						0,
						100
					]
				}
			}
		}
	]
}
> testTime(insertDoc)
Tue Nov 05 2013 11:35:35 GMT-0800 (PST)
Tue Nov 05 2013 11:35:39 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"insertLatencyAverageMicros" : 12.742954533536105,
	"insert" : 23836.25,
	"query" : 0,
	"update" : 0,
	"delete" : 0,
	"getmore" : 0,
	"command" : 238.75
}
> queryDoc
{
	"parallel" : 2,
	"seconds" : 4,
	"ops" : [
		{
			"ns" : "test.test",
			"op" : "find",
			"query" : {
				"a" : 1
			}
		}
	]
}
> testTime(queryDoc)
Tue Nov 05 2013 11:35:49 GMT-0800 (PST)
Tue Nov 05 2013 11:35:53 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"queryLatencyAverageMicros" : 41554.18134715026,
	"insert" : 0,
	"query" : 48.25,
	"update" : 0,
	"delete" : 0,
	"getmore" : 48.25,
	"command" : 0.75
}
> updateDoc
{
	"parallel" : 2,
	"seconds" : 4,
	"ops" : [
		{
			"ns" : "test.test",
			"op" : "update",
			"query" : {
				"a" : 2
			},
			"update" : {
				"a" : 3
			}
		}
	]
}
> testTime(updateDoc)
Tue Nov 05 2013 11:36:03 GMT-0800 (PST)
Tue Nov 05 2013 11:36:11 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 11.86,
	"insert" : 0,
	"query" : 0,
	"update" : 100,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.75
}

Example of it getting longer when you execute it a few times in a row.

> testTime(updateDoc)
Tue Nov 05 2013 11:31:42 GMT-0800 (PST)
Tue Nov 05 2013 11:31:46 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 10.525,
	"insert" : 0,
	"query" : 0,
	"update" : 100,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.75
}
> testTime(updateDoc)
Tue Nov 05 2013 11:31:51 GMT-0800 (PST)
Tue Nov 05 2013 11:32:02 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 11.86,
	"insert" : 0,
	"query" : 0,
	"update" : 50,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.25
}
> testTime(updateDoc)
Tue Nov 05 2013 11:32:05 GMT-0800 (PST)
Tue Nov 05 2013 11:32:20 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 12.33,
	"insert" : 0,
	"query" : 0,
	"update" : 50,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.25
}
> testTime(updateDoc)
Tue Nov 05 2013 11:32:24 GMT-0800 (PST)
Tue Nov 05 2013 11:32:43 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 11.94,
	"insert" : 0,
	"query" : 0.75,
	"update" : 50,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.25
}
> testTime(updateDoc)
Tue Nov 05 2013 11:32:46 GMT-0800 (PST)
Tue Nov 05 2013 11:33:08 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 11.785,
	"insert" : 0,
	"query" : 0,
	"update" : 50,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.25
}
> testTime(updateDoc)
Tue Nov 05 2013 11:33:12 GMT-0800 (PST)
Tue Nov 05 2013 11:33:36 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 12.255,
	"insert" : 0,
	"query" : 0.75,
	"update" : 50,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.25
}
> testTime(updateDoc)
Tue Nov 05 2013 11:33:40 GMT-0800 (PST)
Tue Nov 05 2013 11:34:04 GMT-0800 (PST)
{
	"note" : "values per second",
	"errCount" : NumberLong(0),
	"trapped" : "error: not implemented",
	"updateLatencyAverageMicros" : 12.05,
	"insert" : 0,
	"query" : 0,
	"update" : 50,
	"delete" : 0,
	"getmore" : 0,
	"command" : 1.25
}

Seems to flatten out at 24 in this case, though that was not the behavior I had seen before.

Has been tested on OSX and linux(fedora 19)

Just for context: this was discovered when running a script that runs a series of updates in a row, I was confused about why the script was taking so long time finish.



 Comments   
Comment by Githook User [ 12/Feb/14 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}

Message: SERVER-11586 Fix benchRun.
Branch: master
https://github.com/mongodb/mongo/commit/e5f2bc8346a6c7e9d103f16065d4c24b5ade4a87

Comment by Githook User [ 29/Jan/14 ]

Author:

{u'username': u'dcci', u'name': u'Davide Italiano', u'email': u'davide.italiano@10gen.com'}

Message: SERVER-11586 Default to the old behaviour (write ops) until jstests
are not fixed.
Branch: master
https://github.com/mongodb/mongo/commit/e31a4e112775b9b4d1b52871b9b5a0dc799cea67

Comment by Githook User [ 29/Jan/14 ]

Author:

{u'username': u'dcci', u'name': u'Davide Italiano', u'email': u'davide.italiano@10gen.com'}

Message: SERVER-11586 Implement support for WriteCommands in benchRun and
make it the default. WriteCommands are acknowledged, they don't
require getLastError() calls and make the sync timing issues going
away.
Branch: master
https://github.com/mongodb/mongo/commit/a4f66393f2713b3034ff39e06958e85e5870b7dd

Comment by Andrew Morrow (Inactive) [ 18/Dec/13 ]

The issue appears to be that benchrun is running the workload longer than specified in the 'seconds' field. Some stracing appears to show that after the nanosleep implementing the wait for 'seconds', it gets stuck for some time waiting on a futex:

Good run:

     0.000039 nanosleep({4, 0}, 0x7fff243efda0) = 0
     4.000257 futex(0x19d15b4, FUTEX_WAIT_PRIVATE, 3, NULL) = 0
     0.000316 futex(0x19d1588, FUTEX_WAKE_PRIVATE, 1) = 0

Bad run:

     0.000164 nanosleep({4, 0}, 0x7fff243efda0) = 0
     4.000319 futex(0x19ba884, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
    19.452350 futex(0x19ba858, FUTEX_WAKE_PRIVATE, 1) = 0

Why did it spend an additional 20 seconds waiting on a futex (and apparently generating load the whole time).

Generated at Thu Feb 08 03:26:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.