[SERVER-21306] Separate System Performance workloads into tasks containing logical groupings Created: 24/Sep/15  Updated: 17/Nov/15  Resolved: 05/Nov/15

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 3.2.0-rc3

Type: Improvement Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Ian Whalen (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screenshot-1.png    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-20946 only compile once in sys-perf project Closed
Backwards Compatibility: Fully Compatible
Sprint: QuInt B (11/02/15), QuInt C (11/23/15)
Participants:

 Description   

proposed breakdown is to take the single_cluster_test and break it into the following tasks:

  • YCSB_WT
    • ycsb_load
    • ycsb_100read
    • ycsb_50read50update
  • YCSB_MMAPv1
    • ycsb-mmapv1_load
    • ycsb-mmapv1_100read
    • ycsb-mmapv1_50read50update
  • custom_workloads_WT
    • contended_update-wiredTiger
    • map_reduce_1M_doc-wiredTiger
  • customer_workloads_MMAPv1
    • contended_update-mmapv1
    • map_reduce_1M_doc-mmapv1

And then breaking down shard_shard_cluster_test as follows:

  • YCSB_WT
    • ycsb_load
    • ycsb_100read
    • ycsb_50read50update
  • YCSB_MMAPv1
    • ycsb-mmap_load
    • ycsb-mmap_100read
    • ycsb-mmap_50read50update
  • customer_workloads_MMAPv1
    • map_reduce_1M_doc-mmapv1
  • customer_workloads_WT
    • map_reduce_1M_doc-wiredTiger

But note the emphasis on "proposed". It would be good to list here a few of the additional tests that are incoming in order to 'future-proof' our information architecture.



 Comments   
Comment by Githook User [ 06/Nov/15 ]

Author:

{u'username': u'IanWhalen', u'name': u'Ian Whalen', u'email': u'ian.whalen@gmail.com'}

Message: SERVER-21306 Revamp dependency chain for sys-perf

Two changes:

Comment by Githook User [ 05/Nov/15 ]

Author:

{u'username': u'IanWhalen', u'name': u'Ian Whalen', u'email': u'ian.whalen@gmail.com'}

Message: SERVER-21306 Restructure grouping of sys-perf tests
Branch: master
https://github.com/mongodb/mongo/commit/d6b9e37e71ac615922fedf75dc130161e1b08db9

Comment by Rui Zhang (Inactive) [ 05/Nov/15 ]

Regarding to the prolonged execution time, I found out (with confirmation from Ian) we have multiple runners per group now, so, this may not be a huge issue if we have enough runner.

Comment by Chung-yen Chang [ 05/Nov/15 ]

ian.whalen, the prolonged run time case that Rui pointed out is likely to happen. When that happens, we might hold some AWS resources without doing actual work and that will be wasted cost. One other factor that makes me a little hesitant about making the change is that we use to have more tests showing up on the same page and that makes spotting common trend easier. With this change, common trend will be less obvious.

With both factors in the picture, I am a little less enthusiastic about making this than I was a couple weeks ago. Can we think through the first problem (the prolonged run time) a little more and make sure that we have at least some confidence that we can quickly address that if it becomes a real issue before turning the switch on this one?

rui.zhang, what/where/how do you think we may have to do to solve the prolonged execution? And do you envision that being something that would take days/weeks?

Comment by Ian Whalen (Inactive) [ 04/Nov/15 ]

https://mongodbcr.appspot.com/35280001/ is the latest. Whether it's the final version is now up to you

Comment by Rui Zhang (Inactive) [ 04/Nov/15 ]

I will look at the CR again, is there any new patch, or that is the final version?

Comment by Ian Whalen (Inactive) [ 04/Nov/15 ]

rui.zhang I'm ok with this - as mentioned in standup, if we can't parallelize our tests by throwing more machines at them, then there's just a bug somewhere that we need to fix or work around, whether it's with Evergreen, AWS or otherwise.

I do still need an LGTM from either you or chung-yen.chang though. Would be great if one or both of you could please look at the CR and comment there.

Comment by Rui Zhang (Inactive) [ 04/Nov/15 ]

following up from standup,

I do not have a particular option should we do this or not. just outline couple of thing we need watch out if this go live:

  1. as previous comment pointed out, we may see pre-longed run time for test, which will cause instances created stay up much longer than needed. For example, one extreme situation is if there are 10 patch builds running, run may see execution time became close to 10 times longer.
  2. since there could be multiple runs at the same time, we may run into limitation with VPC (we have 10 so far), this may be okay if we distribute tests across AWS region.

I feel this is good to have, as long as we are okay with the two risks, both of them can be fixed for 1) may require change from evergreen on schedule enhancement; for 2) we just need watch, and open case with AWS if it is a real one, or we can just try better distribution of the tests.

also cc chung-yen.chang

Comment by Ian Whalen (Inactive) [ 04/Nov/15 ]

Got it. Do you think that should block deploy of this patch?

Comment by Rui Zhang (Inactive) [ 04/Nov/15 ]

ian.whalen I asked this question in the collab channel with Mike, as show in this image, there is a change the later scheduled run pre-emptive the current running test, and if this happens, the instances spawned for the pre-emptied run will stay much longer than it should be. As we discussed, add more runner may help this? Anyway, just want to bring this up make sure we understand this, and prepared for it if possible.

Comment by Ian Whalen (Inactive) [ 03/Nov/15 ]

status update: should be good to go now with LGTM of this code review by rui.zhang and completion of this script by mpobrien

Comment by Ian Whalen (Inactive) [ 16/Oct/15 ]

https://mongodbcr.appspot.com/29670001/

Generated at Thu Feb 08 03:56:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.