Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Usability, Write Ops
Labels:
None
Environment:
n/a

Assigned Teams:

Sharding

(Taken from http://groups.google.com/group/mongodb-user/browse_thread/thread/cb38df80eac19a19)

I'd like to suggest adding 'write batch' support to MongoDB in the future.

Summary
----------------------------
It would be nice to be able to know a series of writes will all happen (eventual consistency, not necessarily atomic) - or not at all.

Other transactional features like read locks etc are not to be supported with 'write batching'.

Detailed / Wall of text
----------------------------
I wanted to bring up a feature request regarding 'write transactions' or 'atomic/eventually consistent write batch' support.

A big concern I have with a lack of 'transactions' with MongoDB is that there can be a chance of data inconsistency.

If the primary server dies mid way through updating/inserting multiple documents (perhaps across multiple shards), then you can get 'corrupt' data (according to your application) by just missing a few vital documents in your domain model.

Mongo's internal storage of the saved documents is fine; the only problem is that some document's didn't get saved before the crash, so the ones that were saved are not valid in a domain model sense because they are missing child documents etc.

I understand transactions are not very desirable for performance reasons (especially due to locking).

With that in mind, what about the concept of 'write batching', a write- only transaction, where all writes (across any number of shards) occur at once (eventually consistent) or not at all? Read locks are never taken; and you can't lock rows.

For example; picking a simple example for demo purposes; I'd want to write:

–
mongoDriver.StartBatch()
mongoDriver.db.users.insert(

{username: andrew}

);
mongoDriver.db.messages.insert(

{message: "Welcome!", username:andrew}

);
mongoDriver.db.stats.update({$inc:

{ totalUsers: 1 }

);
mongoDriver.CommitBatch(

{atomic: bool}

)
–

I'd like CommitBatch() to make sure that all of my writes in that batch are either committed; or none at all.

The 'atomic' flag in commit batch would decide whether a 2pc is used to ensure all commits occur at once across multiple shards. With atomic = False; commits happen without a 2PC in the sense that no co-ordination takes place; and an 'eventually consistent' approach is taken to the commit (normally, these multi-shard changes would all appear in a few milliseconds anyway).

This feature would let me know for certain that my data model will remain consistent (either I insert a new user, create a welcome message, and update my statistics – or not at all).

This feature CANNOT be used to perform read locking or 'bank balance' transfers because you can't block readers trying to read a document mid 'write batch' to evaluate the response - there is no read locking, your just applying a range of writes all at once or not at all.

I don't see this impacting performance at all (especially with atomic: false); there are no locks taking place at any time, except for atomic: true which would introduce a slight delay when a 2pc is occurring to coordinate a write batch when requested in the rare case its needed.

–

Conceptually I'd imagine 'begin batch' would mean each shard just logs any future write() queries to a local temporary collection (such as local.writebatches.<connection id>).

A request to 'commit batch' asks each shard whether they have finished writing to the local writebatch collection (or perhaps just always issue each writebatch insert with safe: true); if all is well, then a 'commit writebatch' command is sent to each shard (without a 2pc, unless atomic: true was requested) to persist each write by looping over local.writebatches.<connection id> collection and really
performing the original request.

Some thought needs to be put into failure handling (such as inserting a 'prepared' flag to the local writebatch collection in the event of server failure to ensure its "committed" on recovery), but I think thats not too difficult.

This would be a nice feature to have, it would prevent data inconsistency issues when you don't want your application to suffer; and avoids the locking associated with real transactions (that support read locks, isolation, etc).

related to

SERVER-11500 Multi-document transactions within a single shard

Closed

SERVER-11508 Multi-document transactions across shards [distributed transactions]

Closed

Assignee:: [DO NOT USE] Backlog - Sharding Team

Reporter:: Andrew Armstrong

Participants:: [DO NOT USE] Backlog - Sharding Team, Adrien, Alexander Arutuniants, Andrew Armstrong, Ben McCann, Billy Tetrud, Esha Maharishi, Manuel Lucas, Mohamad, Scott Hernandez, Vincent Sevel

Votes:: 58 Vote for this issue

Watchers:: 51 Start watching this issue

Created:: Mar 21 2011 09:56:33 AM UTC

Updated:: Dec 06 2022 05:44:44 AM UTC

Resolved:: Aug 10 2018 04:56:35 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates