[SERVER-21553] Oplog grows to 3x configured size Created: 19/Nov/15  Updated: 30/Mar/16  Resolved: 02/Dec/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.0.8, 3.2.0-rc6

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: WTplaybook
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 1.png     PNG File 2.png     PNG File 3.png     PNG File 4.png     File benchRun.uncompressible     File diagnostic.data.tar     HTML File monitorOplog     PNG File oplog-grows.png     PNG File oplog-unbounded.png     File oplog_overflow.js     File repro_server21553.js     Text File truncate_span.patch    
Issue Links:
Depends
is depended on by SERVER-21442 WiredTiger changes for MongoDB 3.0.8 Closed
is depended on by WT-1973 MongoDB changes for WiredTiger 2.7.0 Closed
is depended on by SERVER-21730 WiredTiger changes for 3.2.0-rc6 Closed
Backwards Compatibility: Fully Compatible
Backport Completed:
Steps To Reproduce:

python buildscripts/resmoke.py --executor no_server repro_server21553.js

Sprint: Build C (11/20/15)
Participants:

 Description   
  • 12 cores, 24 cpus, 64 GB memory
  • 1-member replica set
  • oplog size 5 GB, default cache
  • data on ssd, -journal on separate hdd- no journal
  • 5 threads inserting 1 KB random docs as fast as possible (see code below)
  • recent nightly build:

    2015-11-19T08:41:38.422-0500 I CONTROL  [initandlisten] db version v3.2.0-rc2-211-gbd58ea2
    2015-11-19T08:41:38.422-0500 I CONTROL  [initandlisten] git version: bd58ea2ba5d17b960981990bb97cab133d7e90ed
    

  • periods where oplog size exceeded configured size start at B and D
  • oplog size recovered during stall that started at C, but did not recover after D
  • growth period starting at D appears to coincide with cache-full condition, but excess size starting at B looks like it may be simply related to rate of inserts


 Comments   
Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Import wiredtiger-wiredtiger-mongodb-3.2-rc4-70-g197eef0.tar.gz from wiredtiger branch mongodb-3.2

ref: 7a4f325..197eef0

bff6525 SERVER-21553 Free blocks during reverse splits.
c70b097 WT-2253: prioritize WT_READGEN_OLDEST pages for eviction.
4fc3e39 WT-2253 Evict pages left behind by in-memory splits.
Branch: master
https://github.com/mongodb/mongo/commit/f4309efa6d478c1dae79d2483d756c00a069175f

Comment by Michael Cahill (Inactive) [ 02/Dec/15 ]

For the record, there were two underlying issues here in WiredTiger:

  1. When leaf pages were fast-deleted, a subsequent split into the parent of the deleted page was deleting the reference to the page in memory without deleting the block on disk. This caused a leak of space on disk.
  2. In some workloads (including capped collection truncates), internal pages referencing many delete leaves could stay in cache. Changing eviction to visit these internal pages causes the space to be freed on disk.
Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Import wiredtiger-wiredtiger-mongodb-3.0.7-20-g27d0cbd.tar.gz from wiredtiger branch mongodb-3.0

ref: deb2d81..27d0cbd

6feaa28 WT-2251 Fix leaf of deleted page addresses.
1da2d3a SERVER-21568 Fix a use-after-free.
4228295 SERVER-21553 Free blocks during reverse splits.
0398515 WT-2241 Use a lock to protect transaction ID allocation.
06a5c7b WT-2237 Avoid yields if we race allocating transaction IDs.
16a418b WT-2237 Have threads publish unique transaction IDs so that updates always become visible immediately on commit.
Branch: v3.0
https://github.com/mongodb/mongo/commit/eb8bc24a0d36b673022faa83976a4b33bc8f676b

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #2354 from wiredtiger/server_21553_30backport

SERVER-21553 3.0 backport
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/27d0cbdf8046565dba6902f4e6ee93b2642f0d19

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: SERVER-21553 3.0 backport bug fix.

Free the referenced block in the right place
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/38369aebcf7da942929a3c27f72495e8dbe3e2d3

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: Remove redundant function I missed in merge of SERVER-21553
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/ba0b7f26cf8ba5b6e1434fb6f1500726e25f489f

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: SERVER-21553 Free blocks during reverse splits.

(cherry picked from commit bff6525c8)

Conflicts:
src/btree/bt_discard.c
src/btree/bt_slvg.c
src/btree/bt_split.c
src/evict/evict_page.c
src/include/btree.i
src/reconcile/rec_write.c
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/42282959f7d9928552456cf41186f115d412c7ca

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2352 from wiredtiger/SERVER-21553-free-deleted-blocks

SERVER-21553 Free blocks during reverse splits.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/bff6525c8382d97221a4d08452aef78ed0b8ea25

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2352 from wiredtiger/SERVER-21553-free-deleted-blocks

SERVER-21553 Free blocks during reverse splits.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/bff6525c8382d97221a4d08452aef78ed0b8ea25

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge branch 'develop' into SERVER-21553-free-deleted-blocks
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/97b549e75f64fe915019057cf42a572e89d49fa9

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Check that ref->addr is NULL, don't try to free it.

During review of this change, noticed that we are trying to free an address in
a path where it should never be set (or we would leak blocks). Assert that the
address is NULL instead of checking whether it needs to be freed.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/5fe8c70e33731e0a6d5d25d2883faa52916f8dcf

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 If truncate leaves an internal page empty, evict it asap.

(Note: works in conjunction with the change to LRU policy for internal pages in WT-2553).
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/745eb56977bb7b636572d7eb8e7b1619f561bbb3

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge branch 'develop' into SERVER-21553-free-deleted-blocks
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/6c82703fd06d34a4daa8a5e4be0e1250108a1e27

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Review feedback.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/5ebfd9211956d794795c0be293c76e8dd22d3f4d

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Remove no-longer-used variables.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/fa28552449d81f24a2c666934d508c525e752bd5

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Finish rename to __wt_ref_addr_free.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/8cf3e9bbaf06634bac1a191f225bc37817da3b25

Comment by Githook User [ 02/Dec/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Free blocks during reverse splits.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/f6a2db06a9d86a64e486a4b6151c6d72dec9fb46

Comment by Keith Bostic (Inactive) [ 01/Dec/15 ]

What's happening is that checkpoints are starting but taking several minutes to complete (and in the case of no-journal runs, I've seen them not complete at all). That's backed up by Martin's screen-shots, too, if I'm reading it correctly, there's a 10-minute checkpoint there.

Comment by Keith Bostic (Inactive) [ 01/Dec/15 ]

I can repro this one; turning on journalling appears to resolve the problem, we top out at around 14GB instead of growing without bound.

For some reason I can't read the ftdc data, so I can't verify this, but without journalling, it seems we're not checkpointing in that WiredTiger database (notice the checkpoint size : 0 lines, and no file bytes available for reuse).

Comment by Max Hirschhorn [ 01/Dec/15 ]

Per Martin's request, I reran repro_server21553.js with the latest changes from WiredTiger. With the fast path enabled for truncates, I'm seeing the oplog size db.getSiblingDB('local').oplog.rs.stats().size range from 5GB to 5.4GB (woot!), but the file size (compressed) appears to grow without bound (e.g. up to 34GB before I killed the script).

Could there still be something that would prevent the pages from becoming available for reuse?

$ ./mongo --port 20010 --eval 'db.getSiblingDB("local").oplog.rs.stats(1024 * 1024).wiredTiger["block-manager"]'
{
    "file allocation unit size" : 4096,
    "blocks allocated" : 1278797,
    "checkpoint size" : 0,
    "allocations requiring file extension" : 1278523,
    "blocks freed" : 267,
    "file magic number" : 120897,
    "file major version number" : 1,
    "minor version number" : 0,
    "file bytes available for reuse" : 0,
    "file size in bytes" : 36444704768
}


git version: 86e7b69a6c52c926d28a60d816faefa6db81eb96

Comment by Githook User [ 30/Nov/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Import wiredtiger-wiredtiger-mongodb-3.2-rc4-41-g8326df6.tar.gz from wiredtiger branch mongodb-3.2

ref: b65381f..8326df6

4c49948 WT-2244 Trigger in-memory splits sooner.
9f2e4f3 WT-2248 WT_SESSION.close is updating WT_CONNECTION_IMPL.default_session.
a6da10e SERVER-21553 Enable fast-path truncate after splits.
39dfd21 WT-2243 Don't keep transaction IDs pinned for reading from checkpoints.
4e1844c WT-2230 multi-split error path.
cace179 WT-2228 avoid unnecessary raw-compression calls.
890ee34 SERVER-21619 Don't do internal page splits after a tree is marked DEAD.
6c7338f WT-2241 Use a lock to protect transaction ID allocation.
978c237 WT-2234 Coverity analysis warnings.
Branch: master
https://github.com/mongodb/mongo/commit/e7181b542b25981db42f74cdaee4e7fc323d3e9d

Comment by Githook User [ 27/Nov/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #2341 from wiredtiger/SERVER-21553

SERVER-21553 Enable fast-path truncate after splits.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/a6da10e9fe561c931490345807332757395e15ce

Comment by Githook User [ 27/Nov/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge pull request #2341 from wiredtiger/SERVER-21553

SERVER-21553 Enable fast-path truncate after splits.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/a6da10e9fe561c931490345807332757395e15ce

Comment by Githook User [ 27/Nov/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Fix a line removed by mistake: set the address during splits.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/5dd8d4dc2e24c4d6633fde475e44f5bc1de47a50

Comment by Githook User [ 27/Nov/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: Merge branch 'develop' into SERVER-21553
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/69955652146b90854c95f7b7d0a266c948b59fc0

Comment by Githook User [ 27/Nov/15 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-21553 Enable fast-path truncate after splits.

A truncate operation attempts to mark leaf pages deleted without reading them
into cache. One of the conditions that has to be met for that fast-path
truncate of pages is that the leaf page not contain overflow items (or we would
need to read it in order to delete the overflow items).

The "no overflow" flag was not being preseved across internal page splits, so
recent changes to splits were defeating fast-path truncation.

Add tracking of the "no overflow" flag for in-memory page addresses so
fast-path truncates work after internal pages are split.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/ad2500b3d87a4761bc27dacfd9321a7feea280ca

Comment by Martin Bligh [ 23/Nov/15 ]

Was able to reproduce with script from bruce.lucas, but not mine. Turns out the key difference is that he's using random data, and I'm using something easily compressible. Might be interesting datapoint ... will try disabling snappy.

Still reproduces with compression disabled.

Comment by Max Hirschhorn [ 23/Nov/15 ]

The following patch makes it so that we don't save the previous truncate point and always truncate from RecordId(0) to the lastRecord of the stone. With the patch applied I saw the oplog size db.getSiblingDB('local').oplog.rs.stats().size range from 5GB to 14GB and the file size (compressed) grow up to 34GB.

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
index c8d9f70..2338497 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
@@ -1169,15 +1169,11 @@ void WiredTigerRecordStore::reclaimOplog(OperationContext* txn) {
         try {
             WriteUnitOfWork wuow(txn);
 
-            WiredTigerCursor startwrap(_uri, _tableId, true, txn);
-            WT_CURSOR* start = startwrap.get();
-            start->set_key(start, _makeKey(_oplogStones->firstRecord));
-
             WiredTigerCursor endwrap(_uri, _tableId, true, txn);
             WT_CURSOR* end = endwrap.get();
             end->set_key(end, _makeKey(stone->lastRecord));
 
-            invariantWTOK(session->truncate(session, nullptr, start, end, nullptr));
+            invariantWTOK(session->truncate(session, nullptr, nullptr, end, nullptr));
             _changeNumRecords(txn, -stone->records);
             _increaseDataSize(txn, -stone->bytes);
 
@@ -1185,9 +1181,6 @@ void WiredTigerRecordStore::reclaimOplog(OperationContext* txn) {
 
             // Remove the stone after a successful truncation.
             _oplogStones->popOldestStone();
-
-            // Stash the truncate point for next time to cleanly skip over tombstones, etc.
-            _oplogStones->firstRecord = stone->lastRecord;
         } catch (const WriteConflictException& wce) {
             LOG(1) << "Caught WriteConflictException while truncating oplog entries, retrying";
         }

However, with both the above patch and the truncate_span.patch applied, I saw the oplog size db.getSiblingDB('local').oplog.rs.stats().size range from 5GB to 11GB and the file size (compressed) stay fixed at 4.6GB. Nevermind, after running for a few more minutes, the oplog size jumped up to 19GB before being truncated to 5GB, and the file size grew to 32GB.

Comment by Max Hirschhorn [ 21/Nov/15 ]

The truncate_span.patch makes it so we truncate the RecordId span of multiple excess stones in a single WT truncate call. With the patch applied I saw the oplog size db.getSiblingDB('local').oplog.rs.stats().size range from 5GB to 18GB and the file size (compressed) grow up to 27GB. It doesn't seem like the WT truncate calls can keep up with the inserts, so we may need to add the back-pressure mechanism back in. I'm curious about the WCEs during the WT truncate call seen in the logs below when trying to truncate 13GB of oplog in one shot.

Logs from the oplog reclaim thread

2015-11-20T19:33:26.613-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(0) and RecordId(6219396090974747839) to remove approximately 48321 records totaling to 53687838 bytes
2015-11-20T19:33:26.784-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396090974747839) and RecordId(6219396090974796114) to remove approximately 48319 records totaling to 53687843 bytes
2015-11-20T19:33:26.941-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396090974796114) and RecordId(6219396090974844456) to remove approximately 48319 records totaling to 53688032 bytes
2015-11-20T19:33:27.099-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396090974844456) and RecordId(6219396090974941115) to remove approximately 96638 records totaling to 107375998 bytes
2015-11-20T19:33:27.421-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396090974941115) and RecordId(6219396090975086071) to remove approximately 144956 records totaling to 161063447 bytes
2015-11-20T19:33:27.887-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396090975086071) and RecordId(6219396095269844721) to remove approximately 193275 records totaling to 214751215 bytes
2015-11-20T19:33:28.529-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396095269844721) and RecordId(6219396095270086483) to remove approximately 241594 records totaling to 268439531 bytes
2015-11-20T19:33:29.354-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396095270086483) and RecordId(6219396099564910036) to remove approximately 289913 records totaling to 322127468 bytes
2015-11-20T19:33:30.342-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396099564910036) and RecordId(6219396103859897172) to remove approximately 434868 records totaling to 483189831 bytes
2015-11-20T19:33:31.883-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396103859897172) and RecordId(6219396112449597552) to remove approximately 628144 records totaling to 697942023 bytes
2015-11-20T19:33:35.944-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396112449597552) and RecordId(6219396129629480930) to remove approximately 1642834 records totaling to 1825382391 bytes
2015-11-20T19:33:52.541-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396129629480930) and RecordId(6219396159694389699) to remove approximately 3044081 records totaling to 3382333072 bytes
2015-11-20T19:34:23.967-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396159694389699) and RecordId(6219396211233919428) to remove approximately 3672217 records totaling to 4080266897 bytes
2015-11-20T19:35:05.858-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396211233919428) and RecordId(6219396228413780789) to remove approximately 628143 records totaling to 697940825 bytes
2015-11-20T19:35:13.698-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4848186 records totaling to 5386906601 bytes
2015-11-20T19:35:29.138-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396228413780789) and RecordId(6219396232708625346) to remove approximately 48319 records totaling to 53688095 bytes
2015-11-20T19:35:29.851-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396232708625346) and RecordId(6219396241298585297) to remove approximately 289911 records totaling to 322125130 bytes
2015-11-20T19:35:29.980-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396241298585297) and RecordId(6219396241298633616) to remove approximately 48319 records totaling to 53688193 bytes
2015-11-20T19:35:29.987-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4840156 records totaling to 5377984798 bytes
2015-11-20T19:35:30.086-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396241298633616) and RecordId(6219396245593544639) to remove approximately 48319 records totaling to 53687898 bytes
2015-11-20T19:35:30.102-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4838805 records totaling to 5376483944 bytes
2015-11-20T19:35:30.220-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396245593544639) and RecordId(6219396245593592957) to remove approximately 48318 records totaling to 53687271 bytes
2015-11-20T19:35:30.265-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4851403 records totaling to 5390481343 bytes
2015-11-20T19:35:30.333-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396245593592957) and RecordId(6219396249888512495) to remove approximately 48319 records totaling to 53687873 bytes
2015-11-20T19:35:30.494-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396249888512495) and RecordId(6219396249888560819) to remove approximately 48318 records totaling to 53687375 bytes
2015-11-20T19:35:30.646-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396249888560819) and RecordId(6219396254183494777) to remove approximately 48319 records totaling to 53688013 bytes
2015-11-20T19:35:30.802-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396254183494777) and RecordId(6219396262773403841) to remove approximately 96637 records totaling to 107375340 bytes
2015-11-20T19:35:31.093-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396262773403841) and RecordId(6219396267068391219) to remove approximately 96637 records totaling to 107374955 bytes
2015-11-20T19:35:31.140-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4873435 records totaling to 5414961156 bytes
2015-11-20T19:35:31.157-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396267068391219) and RecordId(6219396271363333258) to remove approximately 48319 records totaling to 53687945 bytes
2015-11-20T19:35:31.214-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4853694 records totaling to 5393026932 bytes
2015-11-20T19:35:31.283-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396271363333258) and RecordId(6219396271363381577) to remove approximately 48319 records totaling to 53688154 bytes
2015-11-20T19:35:31.290-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4834658 records totaling to 5371874244 bytes
2015-11-20T19:35:31.410-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396271363381577) and RecordId(6219396275658328279) to remove approximately 48318 records totaling to 53687132 bytes
2015-11-20T19:35:31.458-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4850377 records totaling to 5389340969 bytes
2015-11-20T19:35:31.536-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396275658328279) and RecordId(6219396279953269894) to remove approximately 48319 records totaling to 53688194 bytes
2015-11-20T19:35:31.552-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4838622 records totaling to 5376279582 bytes
2015-11-20T19:35:31.658-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396279953269894) and RecordId(6219396279953318260) to remove approximately 48325 records totaling to 53694459 bytes
2015-11-20T19:35:31.665-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4834650 records totaling to 5371866792 bytes
2015-11-20T19:35:31.779-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396279953318260) and RecordId(6219396284248249841) to remove approximately 48319 records totaling to 53688034 bytes
2015-11-20T19:35:31.786-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4835195 records totaling to 5372472277 bytes
2015-11-20T19:35:31.900-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396284248249841) and RecordId(6219396284248298132) to remove approximately 48319 records totaling to 53688135 bytes
2015-11-20T19:35:31.918-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4839159 records totaling to 5376876437 bytes
2015-11-20T19:35:32.014-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396284248298132) and RecordId(6219396288543226959) to remove approximately 48318 records totaling to 53687160 bytes
2015-11-20T19:35:32.022-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4835481 records totaling to 5372789607 bytes
2015-11-20T19:35:32.133-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396288543226959) and RecordId(6219396288543275278) to remove approximately 48319 records totaling to 53688029 bytes
2015-11-20T19:35:32.140-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4834599 records totaling to 5371810247 bytes
2015-11-20T19:35:32.246-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396288543275278) and RecordId(6219396297133136677) to remove approximately 48319 records totaling to 53688015 bytes
2015-11-20T19:35:32.286-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4849446 records totaling to 5388306591 bytes
2015-11-20T19:35:32.358-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396297133136677) and RecordId(6219396297133184995) to remove approximately 48318 records totaling to 53687171 bytes
2015-11-20T19:35:32.479-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396297133184995) and RecordId(6219396297133233313) to remove approximately 48318 records totaling to 53687183 bytes
2015-11-20T19:35:36.479-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396297133233313) and RecordId(6219396344377837377) to remove approximately 1691152 records totaling to 1879069022 bytes
2015-11-20T19:35:54.278-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396713745116735) to remove approximately 5991518 records totaling to 6657283691 bytes
2015-11-20T19:36:43.803-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:36:43.803-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234257469) to remove approximately 12466228 records totaling to 13851450681 bytes
2015-11-20T19:36:50.642-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:36:50.642-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234257469) to remove approximately 12466228 records totaling to 13851450681 bytes
2015-11-20T19:36:56.125-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:36:56.125-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234257469) to remove approximately 12466228 records totaling to 13851450681 bytes
2015-11-20T19:37:04.437-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:37:04.437-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234305724) to remove approximately 12514547 records totaling to 13905138771 bytes
2015-11-20T19:37:13.484-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:37:13.484-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234305724) to remove approximately 12514547 records totaling to 13905138771 bytes
2015-11-20T19:37:23.164-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:37:23.165-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234305724) to remove approximately 12514547 records totaling to 13905138771 bytes
2015-11-20T19:37:33.644-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:37:33.644-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234305724) to remove approximately 12514547 records totaling to 13905138771 bytes
2015-11-20T19:37:43.504-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Caught WriteConflictException while truncating oplog entries, retrying
2015-11-20T19:37:43.504-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Truncating the oplog between RecordId(6219396344377837377) and RecordId(6219396808234305724) to remove approximately 12514547 records totaling to 13905138771 bytes
2015-11-20T19:38:27.786-0500 D STORAGE  [WT RecordStoreThread: local.oplog.rs] Finished truncating the oplog, it now contains approximately 4864036 records totaling to 5404517641 bytes

Comment by Martin Bligh [ 20/Nov/15 ]

Can we instrument the truncate range and check our stored beginning record is correct?
Wondering if we don't retry properly on failure and thus leak.

Comment by Bruce Lucas (Inactive) [ 19/Nov/15 ]

Just realized that these runs were without journal (problem was noticed while investigating another problem where journal was disabled). This will impose extra burden because of the frequent checkpoints, but result should not be unbounded oplog growth.

Comment by Bruce Lucas (Inactive) [ 19/Nov/15 ]

With truncate stats, on build from master just now:

  • at A truncation begins when oplog fills, and appears to keep up for a while
  • at B rate of truncation falls to less than half, even though insert rate remains constant, and oplog begins to grow
  • different from previous run
    • we see ftdc stalls(you can tell from the straight line shap of the graphs), e.g. C-D
    • oplog growth appears to be unbounded
    • very long checkpoint (never finishes before test completes) happens on first checkpoint, not second
Comment by Daniel Pasette (Inactive) [ 19/Nov/15 ]

The latest drop of WT includes statistics for named snapshots and truncation: WT-2218 and WT-2222. I'd love to re-run this test and get those statistics incorporated into the FTDC output.

Also, this build should have snapshot thread disabled by default (SERVER-20825), so this is probably a different issue than what we've seen in earlier release candidates.

Generated at Thu Feb 08 03:57:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.