[SERVER-22231] Add additional test suites to run resmoke.py validation hook Created: 19/Jan/16 Updated: 16/Sep/20 Resolved: 24/Feb/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 3.2.4, 3.3.3 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jonathan Abrahams | Assignee: | Robert Guo (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | test-only, tig-resmoke | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Backport Completed: | |||||||||||||||||
| Sprint: | TIG 10 (02/19/16), TIG 11 (03/11/16) | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
We should investigate adding the following suites:
|
| Comments |
| Comment by Robert Guo (Inactive) [ 08/Mar/16 ] | ||||||||||||||||||||||||||||||
|
Excellent! Thanks for the fix and the explanation Igor. It makes sense these two suites that were failing; they were the only ones that ran more than one mongod on the same VM (2 mongods in a master-slave configuration in this case). | ||||||||||||||||||||||||||||||
| Comment by Igor Canadi [ 08/Mar/16 ] | ||||||||||||||||||||||||||||||
|
Looks like it helped! All the tests are passing for RocksDB now! | ||||||||||||||||||||||||||||||
| Comment by Igor Canadi [ 08/Mar/16 ] | ||||||||||||||||||||||||||||||
|
I just pushed https://github.com/mongodb-partners/mongo-rocks/commit/982c182382fdb0de1ebd1c9770bc9fa79372f893, let's see if this fixes it. | ||||||||||||||||||||||||||||||
| Comment by Igor Canadi [ 08/Mar/16 ] | ||||||||||||||||||||||||||||||
|
Thanks for the investigation Robert! I'm running the tests on machine and even if they're not failing, I see that calling validate() fills up the block cache pretty quickly. With validate() my block cache grows to 5GB pretty quickly; without it, it stays around 1GB. By default we set block cache to be 1/3 of the total RAM available: https://github.com/mongodb-partners/mongo-rocks/blob/master/src/rocks_engine.cpp#L206 What might be happening here is that mongod process using 1/3 of machine RAM is too much. How much memory do those machines have? Let me try defaulting RocksDB memory size to a smaller amount, with calculation similar to what Wiredtiger does: https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/wiredtiger/wiredtiger_init.cpp#L72 Alternative, we could pass in --rocksdbCacheSizeGB=1 (https://github.com/mongodb-partners/mongo-rocks/blob/master/src/rocks_global_options.cpp#L46) | ||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 07/Mar/16 ] | ||||||||||||||||||||||||||||||
|
Hi igor, I did some investigation and it looks like the OOM issue is caused by the validate command running after every test. I used malloc_history -highWaterMark to check the memory usage on OS X and here is the stacktrace.
mongo::RocksRecordStore::validate calls into rocksdb here in rocks_record_store.cpp; There doesn't seem to be a glaring bug in the validation code, which is very similar to WiredTiger's, the issue may simply be that rocks uses more memory when doing a collection scan. But if I remember your talk correctly, rocks should be excepted to use more memory than a b-tree based implementation when doing reads? If you think there's room for improvement in the rock's memory usage, I'd be happy to play around with it after a fix. At the same time, I'll try to ask our AWS team for a larger instance for the rocksdb builds. If you have other suggestions for fixing this build failure, please feel free to let me know as well. Thanks, | ||||||||||||||||||||||||||||||
| Comment by Igor Canadi [ 04/Mar/16 ] | ||||||||||||||||||||||||||||||
|
Thanks Ramon and Robert! FYI I wasn't able to reproduce the failure on my machine, but that was probably expected as it has a lot of memory and this is an OOM issue. I should probably try reproducing on smaller memory machine. | ||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 04/Mar/16 ] | ||||||||||||||||||||||||||||||
|
ramon.fernandez Yep, I'll take a look. | ||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 04/Mar/16 ] | ||||||||||||||||||||||||||||||
|
robert.guo, after this change we started having some failures in the RocksDB testing – see igor's comment on github. Can you please take a look? | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 24/Feb/16 ] | ||||||||||||||||||||||||||||||
|
Author: {u'username': u'guoyr', u'name': u'Robert Guo', u'email': u'robert.guo@10gen.com'}Message: (cherry picked from commit 5e2b94dca62ab39a4fddf8896aae6d66d7922256) | ||||||||||||||||||||||||||||||
| Comment by Githook User [ 24/Feb/16 ] | ||||||||||||||||||||||||||||||
|
Author: {u'username': u'guoyr', u'name': u'Robert Guo', u'email': u'robert.guo@10gen.com'}Message: | ||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 11/Feb/16 ] | ||||||||||||||||||||||||||||||
|
removing concurrency_* since they start their own fixtures instead of through resmoke.py |