[SERVER-4783] V8 sharding tests taking 10x as long as spidermonkey Created: 26/Jan/12  Updated: 11/Jul/16  Resolved: 25/Feb/12

Status: Closed
Project: Core Server
Component/s: Build, Sharding, Testing Infrastructure
Affects Version/s: 2.1.0
Fix Version/s: 2.1.1

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Antoine Girbal
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-4921 replica_set_shard_version.js failing ... Closed
Operating System: ALL
Participants:

 Description   

When all the tests pass in the sharding suite for the Linux V8 builder, the suite takes 5 1/2 hours:
http://buildbot.mongodb.org/builders/Linux%2064-bit%20v8/builds/2950/steps/test_3

On the Linux 64-bit builder (using Spidermonkey), the suite takes ~1/2 hour.
http://buildbot.mongodb.org/builders/Linux%2064-bit/builds/4094/steps/test_3



 Comments   
Comment by Antoine Girbal [ 25/Feb/12 ]

The sharding tests now take 28min on V8 and 37min on SM.
This is after switching to xfs for the fs.

Comment by Antoine Girbal [ 13/Feb/12 ]

Same thing on the legacy builder, it does not choose to preallocate.

 m27018| Mon Feb 13 00:14:07 [initandlisten] MongoDB starting : pid=2602 port=27018 dbpath=/data/db/jstests_dbNoCreate 64-bit host=domU-12-31-39-16-30-A2
 m27018| Mon Feb 13 00:14:07 [initandlisten] 
 m27018| Mon Feb 13 00:14:07 [initandlisten] ** NOTE: This is a development version (2.1.1-pre-) of MongoDB.
 m27018| Mon Feb 13 00:14:07 [initandlisten] **       Not recommended for production.
 m27018| Mon Feb 13 00:14:07 [initandlisten] 
 m27018| Mon Feb 13 00:14:07 [initandlisten] db version v2.1.1-pre-, pdfile version 4.5
 m27018| Mon Feb 13 00:14:07 [initandlisten] git version: d8433150b2b30d49d4a32ba4e47b3ec10d60851d
 m27018| Mon Feb 13 00:14:07 [initandlisten] build info: Linux domU-12-31-39-16-30-A2 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 BOOST_LIB_VERSION=1_45
 m27018| Mon Feb 13 00:14:07 [initandlisten] options: { dbpath: "/data/db/jstests_dbNoCreate", port: 27018 }
 m27018| Mon Feb 13 00:14:07 [initandlisten] journal dir=/data/db/jstests_dbNoCreate/journal
 m27018| Mon Feb 13 00:14:07 [initandlisten] recover : no journal files present, no recovery needed
 m27018| Mon Feb 13 00:14:07 [initandlisten] waiting for connections on port 27018
 m27018| Mon Feb 13 00:14:07 [websvr] admin web console waiting for connections on port 28018

There is no special option to mongod startup, just it does fs test and prealloc is somehow slower.
I dont see how JS can influence that.
I'm not saying it's the only issue here but it def takes a lot of time.
There are 108 matches for preallocating in v8 logs, it takes more than 20s per alloc, so it's like 40min spend there.

Comment by Antoine Girbal [ 11/Feb/12 ]

this is expected since v8 buildbot does prealloc files due to ext4 vs xfs

Comment by Antoine Girbal [ 10/Feb/12 ]

michael did test on buildbots and indeed it preallocates on ext4 but not on xfs.

thoughts:

  • we should have exact same setup for buildbots so that we can compare perf. It'd be good to use xfs since it cuts down on prealloc time a lot.
  • also have at least 1 bot running tests on ext4 or ext3, which take longer but tests could fail there due to timing.
Comment by Antoine Girbal [ 09/Feb/12 ]

sorry I looked at this too quickly.
There is no difference in the way the mongod processes are spawned.
It seems that on the SM buildbot the preallocate is not faster, so it does not do it (confusing that it does not print any log lines to state it).
I am guessing that it is due to a difference in disk / fs between the 2 bots, SM's bot uses xfs.

Comment by Antoine Girbal [ 08/Feb/12 ]

ok I'm guessing there is a difference between the buildbot setups that we should fix.
SM: [initandlisten] options:

{ dbpath: "/data/db/29000", nohttpinterface: true, noprealloc: true, oplogSize: 40, port: 29000, smallfiles: true }

v8: [initandlisten] options:

{ dbpath: "/data/db/sconsTests/", port: 27999 }
Comment by Antoine Girbal [ 02/Feb/12 ]

Not sure what I am missing here.
I am running following line on bs-linux64:
python buildscripts/smoke.py ./jstests/mr1.js > log

I am assuming it should use journaling because there is no option otherwise.
The output states:

cwd [/mnt/home/buildbot/slave/Linux_64bit/mongo]
num procs:59
removing: /data/db/sconsTests//mongod.lock
removing: /data/db/sconsTests//test.ns
removing: /data/db/sconsTests//test.0
removing: /data/db/sconsTests//test.1
Wed Feb  1 21:01:35 [initandlisten] MongoDB starting : pid=11295 port=32000 dbpath=/data/db/sconsTests/ 64-bit host=ip-10-110-9-236
Wed Feb  1 21:01:35 [initandlisten] 
Wed Feb  1 21:01:35 [initandlisten] ** NOTE: This is a development version (2.1.0-pre-) of MongoDB.
Wed Feb  1 21:01:35 [initandlisten] **       Not recommended for production.
Wed Feb  1 21:01:35 [initandlisten] 
Wed Feb  1 21:01:35 [initandlisten] db version v2.1.0-pre-, pdfile version 4.5
Wed Feb  1 21:01:35 [initandlisten] git version: e45cb6180280494f97eb5a0a2be32de56bc1644a
Wed Feb  1 21:01:35 [initandlisten] build info: Linux ip-10-110-9-236 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
Wed Feb  1 21:01:35 [initandlisten] options: { dbpath: "/data/db/sconsTests/", port: 32000 }
Wed Feb  1 21:01:35 [initandlisten] journal dir=/data/db/sconsTests/journal
Wed Feb  1 21:01:35 [initandlisten] recover : no journal files present, no recovery needed
Wed Feb  1 21:01:35 [initandlisten] waiting for connections on port 32000
Wed Feb  1 21:01:35 [websvr] admin web console waiting for connections on port 33000
Wed Feb  1 21:01:36 [initandlisten] connection accepted from 127.0.0.1:44434 #1 (1 connection now open)
Wed Feb  1 21:01:36 [conn1] end connection 127.0.0.1:44434 (0 connections now open)
MongoDB shell version: 2.1.0-pre-
connecting to: 127.0.0.1:32000/test
Wed Feb  1 21:01:36 [initandlisten] connection accepted from 127.0.0.1:44435 #2 (1 connection now open)
null
Wed Feb  1 21:01:36 [conn2] CMD: drop test.mr1
Wed Feb  1 21:01:36 [FileAllocator] allocating new datafile /data/db/sconsTests/test.ns, filling with zeroes...
Wed Feb  1 21:01:36 [FileAllocator] creating directory /data/db/sconsTests/_tmp
...

Basically it seems that journaling is on, but never allocates journal files..
By comparison if I run exact same line on v8 buildbot, it correctly uses a journal.

Comment by Eliot Horowitz (Inactive) [ 02/Feb/12 ]

This machine now has 1tb of space - so you can remove that code post 2.1.0

Comment by Antoine Girbal [ 02/Feb/12 ]

Reason is that it was the 1st run after I cleared the data folder, so it we'll be able to judge real speed once we get more than 1 clean full run.

So this is the reason behind the 10x longer time.
Looking at logs in this ticket's description:
antoine@ag410:~/Desktop$ grep 'preallocating a journal file' stdio-v8 | wc -l
576
antoine@ag410:~/Desktop$ grep 'preallocating a journal file' stdio-sm | wc -l
0

Comment by Antoine Girbal [ 02/Feb/12 ]

It looks like when testing with regular build, it reuses the existing journal most of time.
But with v8, it always creates a new journal even if there is one present.
Doesnt really makes sense that it would have smthing to do with js..

antoine@ag410:~/Desktop$ grep 'preallocating a journal file' stdio1 | wc -l
6
antoine@ag410:~/Desktop$ grep 'preallocating a journal file' stdio2 | wc -l
162

Comment by Eliot Horowitz (Inactive) [ 01/Feb/12 ]

even with more ram this is still much slower than other machines

Comment by Antoine Girbal [ 27/Jan/12 ]

this buildslave seems extremely sluggish.
The load tends to be real high and disk activity pegged at 100%.
Additionally it only has 2GB of ram vs 8GB for regular bot.
I stopped buildbot, then ran a simple test that took 5s on my laptop and took almost a min on this box.

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 159.00 0.00 1272.00 0.00 8.00 0.98 6.17 6.19 98.40

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 1.00 48.88 0.00 50.12

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 0.00 139.50 0.00 1116.00 0.00 8.00 0.98 7.05 7.05 98.30

avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 0.75 53.87 0.00 45.39

Generated at Thu Feb 08 03:06:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.