Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: WiredTiger
Labels:
None

Assigned Teams:

Storage Execution
Backwards Compatibility:
Fully Compatible
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

See the attached image. An identical workload is run against two sets (test and control) with two nodes and an arbiter each. All hosts are configured with a 1GB WT cache. It starts by inserting one million documents in batches of 100 with a 900 byte random string. When this completes (indicated by a blue vertical line in the image), the secondary of the test set is killed, preventing the primary from advancing its commit point or deleting old snapshots. It will create new snapshots until the limit of 1000 uncommitted snapshots is hit. After the secondary is killed, the workload switches to updating documents for 20 minutes. The updates are done in batches of 1000 sequential documents.

The test set appears to use an unbounded amount of disk space and suffers from some extreme pauses. During some, but not all, of these pauses, the system seems to be completely idle with barely any CPU or disk utilization.

To confirm that the problem was not related to there being 1000 snapshots I limited the server to keeping 3 total snapshots by setting the uncommitted snapshot limit to 2 at https://github.com/mongodb/mongo/blob/r3.3.3/src/mongo/db/repl/oplog.cpp#L1100. This didn't seem to make much of a difference.

Also, moving the testSet.stop() line to above begineState('insert') will make the snapshots be of a empty collection, and all inserts will be after the snapshots. Even in this case, the disk usage seems to be unbounded.

Repro:

Download the .js and .py files to a directory that contains a mongod binary
If needed, install the python2 libs pymongo and matplotlib
Launch a mongod on the default port (27017) for reporting and IPC
Run mongo workload.js (This will launch the replica sets, run monitor.py, and do the workload)
Once the workload starts run python plot.py (It will update as new data is collected)

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

monitor.py
3 kB
Mar 25 2016 09:57:21 PM UTC
named_snapshot_behavior.png
314 kB
Mar 25 2016 09:57:21 PM UTC
plot.py
2 kB
Mar 25 2016 09:57:21 PM UTC
workload.js
4 kB
Mar 25 2016 09:57:21 PM UTC

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Mathias Stearn
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Alexander Gorrod, Mathias Stearn, Michael Cahill
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Mar 25 2016 09:57:21 PM UTC
Updated:: Dec 06 2022 04:29:42 AM UTC
Resolved:: Dec 21 2016 11:30:28 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates