Details

    • Type: New Feature
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: Usability, Write Ops
    • Labels:
      None
    • Environment:
      n/a

      Description

      (Taken from http://groups.google.com/group/mongodb-user/browse_thread/thread/cb38df80eac19a19)

      I'd like to suggest adding 'write batch' support to MongoDB in the future.

      Summary
      ----------------------------
      It would be nice to be able to know a series of writes will all happen (eventual consistency, not necessarily atomic) - or not at all.

      Other transactional features like read locks etc are not to be supported with 'write batching'.

      Detailed / Wall of text
      ----------------------------
      I wanted to bring up a feature request regarding 'write transactions' or 'atomic/eventually consistent write batch' support.

      A big concern I have with a lack of 'transactions' with MongoDB is that there can be a chance of data inconsistency.

      If the primary server dies mid way through updating/inserting multiple documents (perhaps across multiple shards), then you can get 'corrupt' data (according to your application) by just missing a few vital documents in your domain model.

      Mongo's internal storage of the saved documents is fine; the only problem is that some document's didn't get saved before the crash, so the ones that were saved are not valid in a domain model sense because they are missing child documents etc.

      I understand transactions are not very desirable for performance reasons (especially due to locking).

      With that in mind, what about the concept of 'write batching', a write- only transaction, where all writes (across any number of shards) occur at once (eventually consistent) or not at all? Read locks are never taken; and you can't lock rows.

      For example; picking a simple example for demo purposes; I'd want to write:


      mongoDriver.StartBatch()
      mongoDriver.db.users.insert(

      {username: andrew}

      );
      mongoDriver.db.messages.insert(

      {message: "Welcome!", username:andrew}

      );
      mongoDriver.db.stats.update({$inc:

      { totalUsers: 1 }

      );
      mongoDriver.CommitBatch(

      {atomic: bool}

      )

      I'd like CommitBatch() to make sure that all of my writes in that batch are either committed; or none at all.

      The 'atomic' flag in commit batch would decide whether a 2pc is used to ensure all commits occur at once across multiple shards. With atomic = False; commits happen without a 2PC in the sense that no co-ordination takes place; and an 'eventually consistent' approach is taken to the commit (normally, these multi-shard changes would all appear in a few milliseconds anyway).

      This feature would let me know for certain that my data model will remain consistent (either I insert a new user, create a welcome message, and update my statistics – or not at all).

      This feature CANNOT be used to perform read locking or 'bank balance' transfers because you can't block readers trying to read a document mid 'write batch' to evaluate the response - there is no read locking, your just applying a range of writes all at once or not at all.

      I don't see this impacting performance at all (especially with atomic: false); there are no locks taking place at any time, except for atomic: true which would introduce a slight delay when a 2pc is occurring to coordinate a write batch when requested in the rare case its needed.

      Conceptually I'd imagine 'begin batch' would mean each shard just logs any future write() queries to a local temporary collection (such as local.writebatches.<connection id>).

      A request to 'commit batch' asks each shard whether they have finished writing to the local writebatch collection (or perhaps just always issue each writebatch insert with safe: true); if all is well, then a 'commit writebatch' command is sent to each shard (without a 2pc, unless atomic: true was requested) to persist each write by looping over local.writebatches.<connection id> collection and really
      performing the original request.

      Some thought needs to be put into failure handling (such as inserting a 'prepared' flag to the local writebatch collection in the event of server failure to ensure its "committed" on recovery), but I think thats not too difficult.

      This would be a nice feature to have, it would prevent data inconsistency issues when you don't want your application to suffer; and avoids the locking associated with real transactions (that support read locks, isolation, etc).

        Issue Links

          Activity

          Hide
          v.sevel@lombardodier.com Vincent Sevel added a comment -

          I have an additional requirement around isolation: I need all updates to be visible at the same time. that is because I maintain financial positions. I cannot have in my system position1 that shows a $100 debit movement, then some time later position2 with a credit movement for the same amount. a financial transaction has to be complete or absent. eventually consistency is not enough for that type of use case.

          Show
          v.sevel@lombardodier.com Vincent Sevel added a comment - I have an additional requirement around isolation: I need all updates to be visible at the same time. that is because I maintain financial positions. I cannot have in my system position1 that shows a $100 debit movement, then some time later position2 with a credit movement for the same amount. a financial transaction has to be complete or absent. eventually consistency is not enough for that type of use case.
          Hide
          elennaro Alexander Arutuniants added a comment -

          Yes, its totally Important, and will do make MongoDB the leading NoSQL database. At least if you somehow add kind of journalled rollback operations for some write operations.
          But the total Write Batch Transactional logic is totally required for business operations. For example my boss sad rough no to NoSQL due to this...

          Show
          elennaro Alexander Arutuniants added a comment - Yes, its totally Important, and will do make MongoDB the leading NoSQL database. At least if you somehow add kind of journalled rollback operations for some write operations. But the total Write Batch Transactional logic is totally required for business operations. For example my boss sad rough no to NoSQL due to this...
          Hide
          chengas123 Ben McCann added a comment -

          Here are two related issues:
          https://jira.mongodb.org/browse/SERVER-11500
          https://jira.mongodb.org/browse/SERVER-11508

          Would be thrilled if this made its way into 2.8

          Show
          chengas123 Ben McCann added a comment - Here are two related issues: https://jira.mongodb.org/browse/SERVER-11500 https://jira.mongodb.org/browse/SERVER-11508 Would be thrilled if this made its way into 2.8
          Hide
          fresheneesz Billy Tetrud added a comment -

          I would love to see this kind of "write batching" support. I'm also worried about catastrophic failures on the application side leading to inconsistent data inside the database.

          Show
          fresheneesz Billy Tetrud added a comment - I would love to see this kind of "write batching" support. I'm also worried about catastrophic failures on the application side leading to inconsistent data inside the database.
          Hide
          mmaarouf Mohamad added a comment -

          This feature is very important! Are there any plans to action this anytime soon?

          Show
          mmaarouf Mohamad added a comment - This feature is very important! Are there any plans to action this anytime soon?

            People

            • Votes:
              57 Vote for this issue
              Watchers:
              53 Start watching this issue

              Dates

              • Created:
                Updated: