Make oplog truncation in WiredTiger more efficient in the presence of large documents

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: 3.0.4
    • Component/s: WiredTiger
    • None
    • Fully Compatible
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Issue Status as of Jan 27, 2017

      ISSUE DESCRIPTION AND IMPACT

      MongoDB 3.2 and above are inefficient when truncating the oplog for databases created with MongoDB 3.0.4 or older with document sizes over 1MB. This inefficiency exhibits as a lot of time spent waiting for the Database lock, at the same time that the oplog collection statistics indicate a spike in "cache pages read into cache".

      The "oplogStones" code to truncate the oplog is new in MongoDB 3.2. That is why this behavior shows up after upgrading from 3.0 to 3.2. If the oplog was created by a version of MongoDB earlier than 3.0.5, it will have the statistic "maximum leaf page value size" set to 1MB. That setting was raised to 64MB in 3.0.5, but existing databases are not updated with the new setting when upgrading.

      The effect of that setting is that documents larger than 1MB are stored in special "overflow pages" by WiredTiger. Overflow pages defeat the fast truncate code that oplogStones relies on for efficiency and cause pages to be read into cache during the truncate. Since that is happening with the Database lock held, all other operations attempting to access the oplog block until the truncate completes. The oplogStones code attempts to truncate 1% of the oplog each iteration, so if these conditions apply, larger oplog sizes will lead to longer periods with the Database lock held exclusive.

      DIAGNOSIS AND AFFECTED VERSIONS

      If the oplog was created by the WiredTiger storage engine in a version of MongoDB earlier than 3.0.5, it will have the statistic "maximum leaf page value size" set to 1MB. If documents larger than 1MB (by any version of MongoDB including versions later than 3.0.5), then this issue can arise. Oplogs created with the MMAPv1 storage engine are not affected.

      REMEDIATION AND WORKAROUNDS

      Create fresh databases with the larger setting for "maximum leaf page value size" using initial sync with a version of MongoDB newer than 3.0.5.

      Original description

      MongoDB 3.2 is inefficient truncating the oplog for databases created earlier than MongoDB 3.0.5 with document sizes over 1MB

            Assignee:
            Michael Cahill (Inactive)
            Reporter:
            Ramon Fernandez
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: