[SERVER-33740] Add Evergreen task for running powercycle against mobile storage engine Created: 08/Mar/18  Updated: 29/Oct/23  Resolved: 06/Aug/18

Status: Closed
Project: Core Server
Component/s: Storage, Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.0.2, 4.1.2

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-33992 Mobile SE: Test validate functionalit... Closed
Related
is related to SERVER-33651 Mobile SE: Use full synchronous mode ... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0
Sprint: TIG 2018-08-13
Participants:
Story Points: 2

 Description   

We should create a powercycle_mobile Evergreen task that performs powercycle testing while running against the mobile storage engine. It should be as straightforward as copy the definition for the powercycle task and specifying --storageEngine=mobile in the mongod_extra_options parameter to the "run powercycle test" function, although some care would need to be taken to disable the FSM clients if we do this ticket before resolving SERVER-32993.

- name: powercycle_mobile
  exec_timeout_secs: 7200 # 2 hour timeout for the task overall
  depends_on:
  - name: compile
  commands:
  - func: "do setup"
  - func: "set up remote credentials"
    vars:
      <<: *powercycle_remote_credentials
  - func: "set up EC2 instance"
    vars:
      <<: *powercycle_ec2_instance
  - command: expansions.update
    <<: *powercycle_expansions
  - func: "run powercycle test"
    vars:
      <<: *powercycle_test
      mongod_extra_options: --mongodOptions=\"--setParameter enableTestCommands=1 --storageEngine mobile\"



 Comments   
Comment by Githook User [ 06/Aug/18 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-33740 Run powercycle task against mobile storage engine.

(cherry picked from commit ac4da362f1470d865e51cf8aa013aa4d4aaab93e)
Branch: v4.0
https://github.com/mongodb/mongo/commit/d49eb6c0435898a2b813d3202d417423f91d0434

Comment by Githook User [ 06/Aug/18 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-33740 Run powercycle task against mobile storage engine.
Branch: master
https://github.com/mongodb/mongo/commit/ac4da362f1470d865e51cf8aa013aa4d4aaab93e

Comment by Sulabh Mahajan [ 20/Mar/18 ]

max.hirschhorn, Sorry for the late response.

I had another look at the SQLite documentation, and looks like at some stage the defaults with WAL mode might have changed to PRAGMA synchronous=FULL. It needs further looking into. I agree with you that either we need to configure SQLite with PRAGMA synchronous=FULL or need to checkpoint explicitly using MobileRecoveryUnit::waitUntilDurable(). We might have to study the sync guarantees provided by the underlying hardware too, for reference : https://www.sqlite.org/howtocorrupt.html#_failure_to_sync.
SERVER-33651 will track the effort that goes in determining these settings.

I agree it is worthwhile to have a powercycle test added for the mobile variants. Mobile SE doesn't have a functional "validate" at the moment. I have filed SERVER-33992 to get the "validate" working and tested.

Comment by Max Hirschhorn [ 08/Mar/18 ]

sulabh.mahajan, after reading the https://sqlite.org/pragma.html#pragma_synchronous reference you linked in SERVER-33651, my impression from how SQLite is currently configured with PRAGMA journal_mode=WAL is that the "validate" command should succeed following a powercycle of the mobile storage engine because the on-disk data would still be in consistent state (w.r.t. collections and indexes); however, the canary document that's inserted with j=true immediately prior to crashing the remote machine wouldn't have durability guaranteed. One way to address this would to also configure SQLite with PRAGMA synchronous=FULL, but an alternative may be to use sqlite3_wal_checkpoint_v2() to create an explicit checkpoint in MobileRecoveryUnit::waitUntilDurable(). The conversation of durability for the mobile storage engine may make more sense to have on SERVER-33651, but I'd be curious if you'd want to add some form of powercycle testing for the mobile storage engine sooner and then later refine the semantics around the canary document after SERVER-33651 is resolved. CC alexander.gorrod

Generated at Thu Feb 08 04:34:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.