Details

    • Backport:
      No
    • # Replies:
      8
    • Last comment by Customer:
      true

      Description

      (Taken from http://groups.google.com/group/mongodb-user/browse_thread/thread/cb38df80eac19a19)

      I'd like to suggest adding 'write batch' support to MongoDB in the future.

      Summary
      ----------------------------
      It would be nice to be able to know a series of writes will all happen (eventual consistency, not necessarily atomic) - or not at all.

      Other transactional features like read locks etc are not to be supported with 'write batching'.

      Detailed / Wall of text
      ----------------------------
      I wanted to bring up a feature request regarding 'write transactions' or 'atomic/eventually consistent write batch' support.

      A big concern I have with a lack of 'transactions' with MongoDB is that there can be a chance of data inconsistency.

      If the primary server dies mid way through updating/inserting multiple documents (perhaps across multiple shards), then you can get 'corrupt' data (according to your application) by just missing a few vital documents in your domain model.

      Mongo's internal storage of the saved documents is fine; the only problem is that some document's didn't get saved before the crash, so the ones that were saved are not valid in a domain model sense because they are missing child documents etc.

      I understand transactions are not very desirable for performance reasons (especially due to locking).

      With that in mind, what about the concept of 'write batching', a write- only transaction, where all writes (across any number of shards) occur at once (eventually consistent) or not at all? Read locks are never taken; and you can't lock rows.

      For example; picking a simple example for demo purposes; I'd want to write:


      mongoDriver.StartBatch()
      mongoDriver.db.users.insert(

      {username: andrew}

      );
      mongoDriver.db.messages.insert(

      {message: "Welcome!", username:andrew}

      );
      mongoDriver.db.stats.update({$inc:

      { totalUsers: 1 }

      );
      mongoDriver.CommitBatch(

      {atomic: bool}

      )

      I'd like CommitBatch() to make sure that all of my writes in that batch are either committed; or none at all.

      The 'atomic' flag in commit batch would decide whether a 2pc is used to ensure all commits occur at once across multiple shards. With atomic = False; commits happen without a 2PC in the sense that no co-ordination takes place; and an 'eventually consistent' approach is taken to the commit (normally, these multi-shard changes would all appear in a few milliseconds anyway).

      This feature would let me know for certain that my data model will remain consistent (either I insert a new user, create a welcome message, and update my statistics – or not at all).

      This feature CANNOT be used to perform read locking or 'bank balance' transfers because you can't block readers trying to read a document mid 'write batch' to evaluate the response - there is no read locking, your just applying a range of writes all at once or not at all.

      I don't see this impacting performance at all (especially with atomic: false); there are no locks taking place at any time, except for atomic: true which would introduce a slight delay when a 2pc is occurring to coordinate a write batch when requested in the rare case its needed.

      Conceptually I'd imagine 'begin batch' would mean each shard just logs any future write() queries to a local temporary collection (such as local.writebatches.<connection id>).

      A request to 'commit batch' asks each shard whether they have finished writing to the local writebatch collection (or perhaps just always issue each writebatch insert with safe: true); if all is well, then a 'commit writebatch' command is sent to each shard (without a 2pc, unless atomic: true was requested) to persist each write by looping over local.writebatches.<connection id> collection and really
      performing the original request.

      Some thought needs to be put into failure handling (such as inserting a 'prepared' flag to the local writebatch collection in the event of server failure to ensure its "committed" on recovery), but I think thats not too difficult.

      This would be a nice feature to have, it would prevent data inconsistency issues when you don't want your application to suffer; and avoids the locking associated with real transactions (that support read locks, isolation, etc).

        Issue Links

          Activity

          Hide
          Scott Hernandez
          added a comment -

          Did you see this: http://jira.mongodb.org/browse/SERVER-2172

          I think in essences you are just asking for "batching support", with a few special bits of behavior.

          Show
          Scott Hernandez
          added a comment - Did you see this: http://jira.mongodb.org/browse/SERVER-2172 I think in essences you are just asking for "batching support", with a few special bits of behavior.
          Hide
          Andrew Armstrong
          added a comment -

          Hi Scott,

          Nope I didn't see that ticket.

          I'm not sure whether that ticket expresses the other major part of this request, which is consistency of the write across multiple shards/documents, rather than just being say a perf optimization.

          Cheers

          Show
          Andrew Armstrong
          added a comment - Hi Scott, Nope I didn't see that ticket. I'm not sure whether that ticket expresses the other major part of this request, which is consistency of the write across multiple shards/documents, rather than just being say a perf optimization. Cheers
          Hide
          Andrew Armstrong
          added a comment -

          I was just reading NoSQL @ Netflix (not using MongoDB; but the topic applies); one of the discussion points was how Netflix write a lot of 'consistency checker' programs that run all the time (see http://highscalability.com/blog/2011/4/6/netflix-run-consistency-checkers-all-the-time-to-fixup-trans.html ) to ensure the underlying data is (eventually) consistent due to the lack of write transactions.

          While developing a personal project using MongoDB; frequently I need to unfortunately reason about the chance that a crash/exception/etc on the client/server will "corrupt" the application data because one record was written but not the other.

          Without thinking too much into it; I would think Netflix's need to write 'consistency checkers' and my own hesitation to do any multi-document updates would disappear if I could guarantee to my application that logical application data corruption is impossible (for all intents and purposes) by using a 'write batch' feature that makes MongoDB assert to me that "yes, these 2+ records your wanting to write/delete/upsert/etc will happen (eventually), or not at all".

          It's great MongoDB takes care of sharding, failover, replication etc easily. This is another big problem that could be tackled.

          I don't see any existing NoSQL solutions considering this as a real feature yet, is it important to anyone else?

          Show
          Andrew Armstrong
          added a comment - I was just reading NoSQL @ Netflix (not using MongoDB; but the topic applies); one of the discussion points was how Netflix write a lot of 'consistency checker' programs that run all the time (see http://highscalability.com/blog/2011/4/6/netflix-run-consistency-checkers-all-the-time-to-fixup-trans.html ) to ensure the underlying data is (eventually) consistent due to the lack of write transactions. While developing a personal project using MongoDB; frequently I need to unfortunately reason about the chance that a crash/exception/etc on the client/server will "corrupt" the application data because one record was written but not the other. Without thinking too much into it; I would think Netflix's need to write 'consistency checkers' and my own hesitation to do any multi-document updates would disappear if I could guarantee to my application that logical application data corruption is impossible (for all intents and purposes) by using a 'write batch' feature that makes MongoDB assert to me that "yes, these 2+ records your wanting to write/delete/upsert/etc will happen (eventually), or not at all". It's great MongoDB takes care of sharding, failover, replication etc easily. This is another big problem that could be tackled. I don't see any existing NoSQL solutions considering this as a real feature yet, is it important to anyone else?
          Hide
          ixio
          added a comment -

          This feature is very important to me too. I'm designing a new project where a MongoDB database would fit very well, except for a few occasional cases where I need multi-document transactions. I don't know if this would impact performances at all, but even if this is the case it is not a problem for me as long as it's optional (and the penalty only occurs when used, ie it does not impact all requests but only the few ones that use this feature).

          Show
          ixio
          added a comment - This feature is very important to me too. I'm designing a new project where a MongoDB database would fit very well, except for a few occasional cases where I need multi-document transactions. I don't know if this would impact performances at all, but even if this is the case it is not a problem for me as long as it's optional (and the penalty only occurs when used, ie it does not impact all requests but only the few ones that use this feature).
          Hide
          Manuel Lucas
          added a comment -

          Well, I am trying to fix this problem using a table that keeps a big document for each batch before writing it to the database. Before reading any data this table must be empty, if not the batch must be performed (or one batch is in course). The problem resides about reading a half writting batch, so another locking system will be required.

          Show
          Manuel Lucas
          added a comment - Well, I am trying to fix this problem using a table that keeps a big document for each batch before writing it to the database. Before reading any data this table must be empty, if not the batch must be performed (or one batch is in course). The problem resides about reading a half writting batch, so another locking system will be required.
          Hide
          Vincent Sevel
          added a comment -

          I have an additional requirement around isolation: I need all updates to be visible at the same time. that is because I maintain financial positions. I cannot have in my system position1 that shows a $100 debit movement, then some time later position2 with a credit movement for the same amount. a financial transaction has to be complete or absent. eventually consistency is not enough for that type of use case.

          Show
          Vincent Sevel
          added a comment - I have an additional requirement around isolation: I need all updates to be visible at the same time. that is because I maintain financial positions. I cannot have in my system position1 that shows a $100 debit movement, then some time later position2 with a credit movement for the same amount. a financial transaction has to be complete or absent. eventually consistency is not enough for that type of use case.
          Hide
          Alexander Arutuniants
          added a comment -

          Yes, its totally Important, and will do make MongoDB the leading NoSQL database. At least if you somehow add kind of journalled rollback operations for some write operations.
          But the total Write Batch Transactional logic is totally required for business operations. For example my boss sad rough no to NoSQL due to this...

          Show
          Alexander Arutuniants
          added a comment - Yes, its totally Important, and will do make MongoDB the leading NoSQL database. At least if you somehow add kind of journalled rollback operations for some write operations. But the total Write Batch Transactional logic is totally required for business operations. For example my boss sad rough no to NoSQL due to this...
          Hide
          Ben McCann
          added a comment -

          Here are two related issues:
          https://jira.mongodb.org/browse/SERVER-11500
          https://jira.mongodb.org/browse/SERVER-11508

          Would be thrilled if this made its way into 2.8

          Show
          Ben McCann
          added a comment - Here are two related issues: https://jira.mongodb.org/browse/SERVER-11500 https://jira.mongodb.org/browse/SERVER-11508 Would be thrilled if this made its way into 2.8

            People

            • Votes:
              37 Vote for this issue
              Watchers:
              34 Start watching this issue

              Dates

              • Created:
                Updated:
                Days since reply:
                23 weeks, 3 days ago
                Date of 1st Reply: