[SERVER-32997] Mobile SE: Design and implement multi-reader or single-writer concurrency Created: 30/Jan/18  Updated: 29/Oct/23  Resolved: 18/Jun/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.1, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Sulabh Mahajan Assignee: Sulabh Mahajan
Resolution: Fixed Votes: 0
Labels: SERG, nonnyc, storage-engines
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-Finish-the-supportsDBLocking-implementation.patch    
Issue Links:
Backports
Depends
is depended on by SERVER-32697 Add jstestfuzz, jstestfuzz_concurrent... Closed
Duplicate
is duplicated by SERVER-32697 Add jstestfuzz, jstestfuzz_concurrent... Closed
Related
related to SERVER-34953 MobileSE: validate on mobile should r... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Storage Non-NYC 2018-05-07, Storage Non-NYC 2018-05-21, Storage Non-NYC 2018-06-04, Storage Non-NYC 2018-06-18
Participants:
Linked BF Score: 26
Story Points: 13

 Description   

SERVER-32675 resolved Mobile SE's some of the major issues with concurrency. I still see a few tests hitting either write conflicts or DB locked. These tests need to be investigated and a fix made accordingly. This ticket will track that effort.



 Comments   
Comment by Githook User [ 05/Jul/18 ]

Author:

{'username': 'sulabhM', 'name': 'Sulabh Mahajan', 'email': 'sulabh.mahajan@mongodb.com'}

Message: SERVER-32997 Implement instance level locking for mobile SE

(cherry picked from commit cc8aab5249ccdec471e33bda087faecb53a6d9bf)
Branch: v4.0
https://github.com/mongodb/mongo/commit/bbe1eaa3b463b2f95ff5c8198f55da79991112ef

Comment by Githook User [ 18/Jun/18 ]

Author:

{'username': 'sulabhM', 'name': 'Sulabh Mahajan', 'email': 'sulabh.mahajan@mongodb.com'}

Message: SERVER-32997 Implement instance level locking for mobile SE
Branch: master
https://github.com/mongodb/mongo/commit/cc8aab5249ccdec471e33bda087faecb53a6d9bf

Comment by Sulabh Mahajan [ 22/May/18 ]

I had a discussion with milkie regarding the above approaches and we concluded that it is the best to modify lock manager to support instance level locking for mobile SE. There are places where proper locks are not held when doing reads or writes, SERVER-34790 is open to address that issue. Global instance lock is likely to face some of these issues.

The plan proposed for now is to put debug code in the mobile SE to identify places where proper locks from instance manager are not taken. I ran a few tests and I think a few of these issues are in the code related to capped collections, rename collection and validate collections. I am still debugging to get more details.

Comment by Eric Milkie [ 16/May/18 ]

How is approach 2 different from using instance-level-locking? Presumably, all the places where we have a WriteUnitOfWork, we have already acquired a Global IX lock, which for instance-level-locking can be a Global X lock instead. Then you would achieve the same effect as doing internal exclusive locking for WUOW's.

Comment by Sulabh Mahajan [ 16/May/18 ]

The patch making fixes for the mobile SE didn't prove to be useful. The test run showed additional crashes other than the existing locked/busy issues.

We tried couple of other approaches:

  1. Throw write conflict exception each time a write operation gets a locked/busy state. The idea is that this exception will be caught and the whole operation retried. If tried enough times this operation should go through as it finds itself not conflicting with a parallel operation. We faced couple of issues with this approach:
    a. There might be places the exception is not caught and retried, or the operation is left in a non idempotent state and a retry messes the system up. When I ran the test over this change, I saw several such issues, crashes and invariant failures.
    b. This approach isn't very deterministic and doesn't build a confidence in how we are handling concurrency
  2. Take a transaction when a write unit of work (wuow) goes active. This transaction will be in exclusive mode and will stop all other readers and writers from accessing the database. A commit or rollback of wuow will release the lock, letting other readers proceed, or another single wuow to go active again.
    a. The assumption with this approach is that no write is done outside the wuow being active. An invariant would help catch cases where this assumption doesnt hold good.
    b. There will be some work needed in putting the wuow active portions under a reader/writer lock
    c. Initial tests on a proof of concept change look promising.
Comment by Sulabh Mahajan [ 09/May/18 ]

milkie suggested to try the attached patch which implements supportsDBLocking for the mobile SE.

This is along the lines of the 1st approach as suggested above. This fixes/enhances the lock manager to provide the total instance locking by converting Global IX and IS locking into X and S locks, respectively.
Benefits of this approach:

  • Rather than building a new locking inside the SE, it can stay at a single place in the lock manager.
  • By not building locking inside SE, we continue to utilise all the diagnostic and introspection statistics that we currently have for the mongodb lock manager.

We have a meeting setup to continue discussing the solution for this issue. I will put an update as we make progress.

Comment by Sulabh Mahajan [ 04/May/18 ]

Root Cause:

This is caused by a mismatch in how concurrency is handled in MongoDB vs SQLite:

  • MongoDB has capability to handle concurrency outside the storage engine at either database level or at document level. It expects the storage engines to be one of those two kinds and register those capabilities when initialised.
  • MongoDB's multiple databases map to a single database/file in SQLite and SQLite offers concurrency at that level.
  • Moreover, the fix for SERVER-32675 to take an immediate lock in a transaction instead of delaying it, resolved some of the write concurrency issues but is causing certain read operations to take an exclusive lock.
  • To conclude, SQLite concurrency doesn't directly align with what MongoDB expects from its storage engines.

Possible Solutions:

  1. Modify MongoDB outside storage engine to work with instance wide (all databases together) concurrency.
  2. Modify SQLite storage engine to handle concurrency within itself, while advertising itself as supporting document level concurrency.

After a discussion within the team we have decided to explore the 2nd approach above. This will localise the concurrency control inside the SQLite SE alone, making it a simpler (to implement and debug) approach than the other. Also there is a desire to get away from having to support storage engines that advertise themselves being database wide concurrent.

Generated at Thu Feb 08 04:31:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.