Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22634

Data size change for oplog deletes can overflow 32-bit int

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.0.9
    • Fix Version/s: 3.0.10
    • Component/s: Storage
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      Issue Status as of Feb 29, 2016

      ISSUE SUMMARY
      In MongoDB 3.0 nodes running with the WiredTiger storage engine, an integer overflow condition may cause a replica set to lose write availability when write concern is bigger than 1.

      Under write-intensive workloads, it is possible for the oplog of a replica set to grow past its configured size. If this happens, the system will attempt to remove up to 20,000 documents from the oplog to shrink it. If the total size of those 20,000 documents exceeds 2GB, this document removal will result in an overflow condition in the 32-bit integer that records the size change.

      As a result, the size change will be improperly recorded while the oplog will still appear to exceed the maximum configured size, so the system will attempt to delete more data from the oplog. In extreme cases this can result in the entire contents of the oplog being deleted.

      While regular capped collections can be affected by this bug as well, it is very unlikely given the nature of this bug.

      USER IMPACT
      If this bug is triggered under the conditions described above, replication will cease and the affected replica set will need to be recovered manually.

      In the unlikely case a regular capped collection is affected, the system will remove data from the capped collection at a faster than normal pace, so it is possible that the collection is emptied completely.

      WORKAROUNDS
      No workarounds exist for this issue. MongoDB users running or wishing to run with the WiredTiger storage engine must upgrade to 3.0.10 or newer. MongoDB 3.2 is not affected by this bug, so users may also consider upgrading to MongoDB version 3.2.3 or newer.

      AFFECTED VERSIONS
      Only MongoDB 3.0 users running with the WiredTiger storage engine may be affected by this issue. No other configuration of MongoDB is affected.

      FIX VERSION
      The fix is included in the 3.0.10 production release. MongoDB 3.2 is not affected.

      Original description

      In wiredtiger_record_store.cpp, _increaseDataSize is declared to take an int for the size change:

      void WiredTigerRecordStore::_increaseDataSize(OperationContext* txn, int amount)
      

      But when called from cappedDeleteAsNeeded_inlock, the amount may overflow a 32-bit int if many large records are being deleted, resulting in (very) inaccurate accounting of the size of an oplog. This can result in the oplog deleter thread deleting everything in the oplog in order to try to get it back down to the configured maximum size, causing replication to cease.

        Issue Links

          Activity

          Hide
          bruce.lucas Bruce Lucas added a comment -

          This appears to have been corrected in 3.2 and master, but not 3.0.

          Show
          bruce.lucas Bruce Lucas added a comment - This appears to have been corrected in 3.2 and master, but not 3.0.
          Hide
          adam.midvidy Adam Midvidy (Inactive) added a comment - - edited

          would backporting SERVER-19800 to 3.0 be sufficient?

          Show
          adam.midvidy Adam Midvidy (Inactive) added a comment - - edited would backporting SERVER-19800 to 3.0 be sufficient?
          Hide
          bruce.lucas Bruce Lucas added a comment - - edited

          I think so, although it wouldn't quite apply cleanly because _amount is already an int, not a bool in 3.0.9.

          Show
          bruce.lucas Bruce Lucas added a comment - - edited I think so, although it wouldn't quite apply cleanly because _amount is already an int, not a bool in 3.0.9.
          Hide
          kevin.pulo Kevin Pulo added a comment -

          This affects all high-throughput capped collections, not just the oplog, right?

          Show
          kevin.pulo Kevin Pulo added a comment - This affects all high-throughput capped collections, not just the oplog, right?
          Hide
          bruce.lucas Bruce Lucas added a comment -

          I believe all capped collection deletion goes through that code path, so yes, I think so.

          Show
          bruce.lucas Bruce Lucas added a comment - I believe all capped collection deletion goes through that code path, so yes, I think so.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

          Message: SERVER-19800 DataSizeChange forces an int into a bool

          Fix for SERVER-22634: data size change for oplog deletes can overflow 32-bit int

          (cherry picked from commit 2a11d0957b397e2c9bcb4230da9d764b50aaac3b)
          Branch: v3.0
          https://github.com/mongodb/mongo/commit/3533581b43ae78884e3b5e43b92773a4007baa88

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'} Message: SERVER-19800 DataSizeChange forces an int into a bool Fix for SERVER-22634 : data size change for oplog deletes can overflow 32-bit int (cherry picked from commit 2a11d0957b397e2c9bcb4230da9d764b50aaac3b) Branch: v3.0 https://github.com/mongodb/mongo/commit/3533581b43ae78884e3b5e43b92773a4007baa88

            People

            • Votes:
              1 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: