[SERVER-23359] WiredTiger should not cache updates between named snapshots Created: 25/Mar/16 Updated: 06/Dec/22 Resolved: 21/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Assigned Teams: |
Storage Execution
|
| Backwards Compatibility: | Fully Compatible |
| Participants: |
| Description |
|
See the attached image. An identical workload is run against two sets (test and control) with two nodes and an arbiter each. All hosts are configured with a 1GB WT cache. It starts by inserting one million documents in batches of 100 with a 900 byte random string. When this completes (indicated by a blue vertical line in the image), the secondary of the test set is killed, preventing the primary from advancing its commit point or deleting old snapshots. It will create new snapshots until the limit of 1000 uncommitted snapshots is hit. After the secondary is killed, the workload switches to updating documents for 20 minutes. The updates are done in batches of 1000 sequential documents. The test set appears to use an unbounded amount of disk space and suffers from some extreme pauses. During some, but not all, of these pauses, the system seems to be completely idle with barely any CPU or disk utilization. To confirm that the problem was not related to there being 1000 snapshots I limited the server to keeping 3 total snapshots by setting the uncommitted snapshot limit to 2 at https://github.com/mongodb/mongo/blob/r3.3.3/src/mongo/db/repl/oplog.cpp#L1100. This didn't seem to make much of a difference. Also, moving the testSet.stop() line to above begineState('insert') will make the snapshots be of a empty collection, and all inserts will be after the snapshots. Even in this case, the disk usage seems to be unbounded. Repro:
|
| Comments |
| Comment by Alexander Gorrod [ 21/Dec/16 ] |
|
Cleaning up intermediate updates is incompatible with the requirements of future multi-document transaction support. |
| Comment by Michael Cahill (Inactive) [ 31/Mar/16 ] |
|
redbeard0531, we did discuss discarding intermediate versions, that didn't make it into 3.2. It shouldn't be too hard, but we need some additional tracking of transaction snapshots over what we maintain today. I'll use this test case to measure how effective the solution is. |
| Comment by Mathias Stearn [ 29/Mar/16 ] |
|
michael.cahill When we were working on the design for this, it sounded like WT would automatically purge unneeded intermediate versions of documents. So if there is a snapshot at version 1 of a document, then it is updated 1000 times, it will only keep version 1 and the latest version. If this isn't the case, we may need to rethink our snapshot retention policy. |
| Comment by Michael Cahill (Inactive) [ 29/Mar/16 ] |
|
This looks like expected behavior given the current design: WiredTiger keeps all updates after the oldest snapshot. Once the cache becomes full, they overflow into the "lookaside table" (WiredTigerLAS.wt). The next step should be to run the workload and gather diagnostic data to confirm that versions are overflowing into the lookaside table. We should also confirm where threads are blocked when no progress is being made. |