Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26818

Make oplog truncation in WiredTiger more efficient in the presence of large documents

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Icon: Major - P3 Major - P3
    • None
    • 3.0.4
    • WiredTiger
    • None
    • Fully Compatible

    Description

      Issue Status as of Jan 27, 2017

      ISSUE DESCRIPTION AND IMPACT

      MongoDB 3.2 and above are inefficient when truncating the oplog for databases created with MongoDB 3.0.4 or older with document sizes over 1MB. This inefficiency exhibits as a lot of time spent waiting for the Database lock, at the same time that the oplog collection statistics indicate a spike in "cache pages read into cache".

      The "oplogStones" code to truncate the oplog is new in MongoDB 3.2. That is why this behavior shows up after upgrading from 3.0 to 3.2. If the oplog was created by a version of MongoDB earlier than 3.0.5, it will have the statistic "maximum leaf page value size" set to 1MB. That setting was raised to 64MB in 3.0.5, but existing databases are not updated with the new setting when upgrading.

      The effect of that setting is that documents larger than 1MB are stored in special "overflow pages" by WiredTiger. Overflow pages defeat the fast truncate code that oplogStones relies on for efficiency and cause pages to be read into cache during the truncate. Since that is happening with the Database lock held, all other operations attempting to access the oplog block until the truncate completes. The oplogStones code attempts to truncate 1% of the oplog each iteration, so if these conditions apply, larger oplog sizes will lead to longer periods with the Database lock held exclusive.

      DIAGNOSIS AND AFFECTED VERSIONS

      If the oplog was created by the WiredTiger storage engine in a version of MongoDB earlier than 3.0.5, it will have the statistic "maximum leaf page value size" set to 1MB. If documents larger than 1MB (by any version of MongoDB including versions later than 3.0.5), then this issue can arise. Oplogs created with the MMAPv1 storage engine are not affected.

      REMEDIATION AND WORKAROUNDS

      Create fresh databases with the larger setting for "maximum leaf page value size" using initial sync with a version of MongoDB newer than 3.0.5.

      Original description

      MongoDB 3.2 is inefficient truncating the oplog for databases created earlier than MongoDB 3.0.5 with document sizes over 1MB

      Attachments

        Activity

          People

            michael.cahill@mongodb.com Michael Cahill (Inactive)
            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: