[SERVER-21833] Compact does not release space to the system with WiredTiger Created: 09/Dec/15  Updated: 07/Dec/16  Resolved: 07/Jan/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.7
Fix Version/s: 3.2.3, 3.3.0

Type: Bug Priority: Major - P3
Reporter: Luke Jolly Assignee: Keith Bostic (Inactive)
Resolution: Done Votes: 0
Labels: WTplaybook, code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-21872 WiredTiger changes for 3.2.1 Closed
Duplicate
is duplicated by SERVER-22015 "compact" no effect Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:

 Description   

The documentation clearly claims that "On WiredTiger, compact will rewrite the collection and indexes to minimize disk space by releasing unused disk space to the system." However as far as I can tell in 3.0.7 this does not actually happen. I reduced dataSize of a collection from 130 GB to 30 GB and the storageSize remained the same at 61 GB after running compact. This was mentioned in SERVER-19062 which was marked as a dup for a ticket that doesn't address this problem.



 Comments   
Comment by Githook User [ 29/Jan/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.7.0-559-g07966a4.tar.gz from wiredtiger branch mongodb-3.2

ref: 3c2ad56..07966a4

WT-1517 schema format edge cases
WT-1801 Add a directory sync after rollback of a WT_SESSION::rename operation
WT-2060 Simplify aggregation of statistics
WT-2073 metadata cleanups
WT-2099 Seeing memory underflow messages
WT-2113 truncate01 sometimes fails
WT-2142 Connection cleanup in Python tests
WT-2177 Add an optional per-thread seed to random number generator
WT-2198 bulk load and column store appends
WT-2216 simplify row-store search loop slightly
WT-2225 New split code performance impact
WT-2231 pinned page cursor searches could check parent keys
WT-2235 wt printlog option without unicode
WT-2242 WiredTiger treats dead trees the same as other trees in eviction
WT-2244 Trigger in-memory splits sooner
WT-2245 WTPERF Truncate has no ability to catch up when it falls behind
WT-2246 column-store append searches the leaf page; the maximum record number fails CRUD operations
WT-2247 variable-length column-store in-memory page splits
WT-2256 WTPERFs throttle option fires in bursts
WT-2257 wtperf doesn't handle overriding workload config
WT-2258 WiredTiger preloads pages even when direct-IO is configured.
WT-2259 __wt_evict_file_exclusive_on() should clear WT_BTREE_NO_EVICTION on error
WT-2260 Workloads evict internal pages unexpectedly
WT-2262 Random sampling is skewed by tree shape
WT-2265 Wiredtiger related change in ppc64le specific code block in gcc.h
WT-2266 Add wtperf config to set if perf thresholds are fatal
WT-2267 Improve wtperf throttling implementation to provide steady load
WT-2269 wtperf should dump its config everytime it runs
WT-2272 Stress test assertion in the sweep server
WT-2275 broken DB after application crash
WT-2276 tool to decode checkpoint addr
WT-2277 Remove WT check against big-endian systems
WT-2279 Define WT_PAUSE(), WT_FULL_BARRIER(), etc when s390x is defined
WT-2281 wtperf smoke.sh fails on ppc64le
WT-2282 error in wt_txn_update_oldest verbose message test
WT-2283 retry in txn_update_oldest results in a hang
WT-2284 Repeated macro definition
WT-2285 configure should set BUFFER_ALIGNMENT_DEFAULT to 4kb on linux
WT-2287 WT_SESSION.rebalance
WT-2289 failure in fast key check
WT-2290 WT_SESSION.compact could be more effective.
WT-2291 Random cursor walk inefficient in skip list only trees
WT-2295 WT_SESSION.create does a full-scan of the main table
WT-2296 New log algorithm needs improving for sync/flush settings
WT-2297 Fix off-by-one error in Huffman config file parsing
WT-2299 upper-level WiredTiger code is reaching into the block manager
WT-2301 Add reading a range to wtperf
WT-2303 Build warning in wtperf
WT-2304 wtperf crash dumping config
WT-2305 Fix coverity scan issues on 23/12/2015
WT-2307 Internal page splits can corrupt cursor iteration
WT-2308 custom extractor for ref_cursors in join cursor
WT-2311 Support Sparc
WT-2312 re-creating a deleted column-store page can corrupt the in-memory tree
WT-2313 sweep-server: conn_dhandle.c, 610: dhandle != conn->cache->evict_file_next
WT-2314 page-swap error handling is inconsistent
WT-2316 stress test failure: WT_CURSOR.prev out-of-order returns
WT-2320 Only check copyright when cutting releases
WT-2321 WT-2321: race between eviction and worker threads on the eviction queue
WT-2326 Change WTPERF to use new memory allocation functions instead of the standard
WT-2328 schema drop does direct unlink, it should use a block manager interface.
WT-2331 Checking of search() result for reference cursors before join()
WT-2332 Bug in logging write-no-sync mode
WT-2333 Add a flag so drop doesn't block
WT-2335 NULL pointer crash in config_check_search with invalid configuration string
WT-2338 Disable using pre-allocated log files when backup cursor is open
WT-2339 format post-rebalance verify failure (stress run #11586)
WT-2340 Add logging guarantee assertions, whitespace
WT-2342 Enhance wtperf to support background create and drop operations
WT-2344 OS X compiler warning
WT-2347 Java: schema format edge cases
WT-2348 xargs -P isn't portable
WT-2355 Fix minor scratch buffer usage in logging
SERVER-21833 Compact does not release space to the system with WiredTiger
SERVER-21887 $sample takes disproportionately long time on newly created collection
SERVER-22064 Coverity analysis defect 77699: Unchecked return value
SERVER-21944 WiredTiger changes for 3.2.2
Branch: v3.2
https://github.com/mongodb/mongo/commit/5d6532f3d5227ff76f62c4810c98a4ef4d0c8c56

Comment by Githook User [ 07/Jan/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.7.0-269-g44463c5.tar.gz from wiredtiger branch mongodb-3.4

ref: 3c2ad56..44463c5

SERVER-21833 Compact does not release space to the system with WiredTiger
WT-2060 Simplify aggregation of statistics
WT-2099 Seeing memory underflow messages
WT-2113 truncate01 sometimes fails
WT-2177 Add a per-thread seed to random number generator
WT-2198 bulk load and column store appends
WT-2231 pinned page cursor searches could check parent keys
WT-2235 wt printlog option without unicode
WT-2245 WTPERF Truncate has no ability to catch up when it falls behind
WT-2246 column-store append searches the leaf page; the maximum record number fails CRUD operations
WT-2256 WTPERFs throttle option fires in bursts
WT-2257 wtperf doesn't handle overriding workload config
WT-2259 __wt_evict_file_exclusive_on() should clear WT_BTREE_NO_EVICTION on error
WT-2260 Workloads evict internal pages unexpectedly
WT-2262 Random sampling is skewed by tree shape
WT-2265 Wiredtiger related change in ppc64le specific code block in gcc.h
WT-2266 Add wtperf config to set if perf thresholds are fatal
WT-2269 wtperf should dump its config everytime it runs
WT-2272 Stress test assertion in the sweep server
WT-2275 broken DB after application crash
WT-2276 tool to decode checkpoint addr
WT-2277 Remove WT check against big-endian systems
WT-2279 Define WT_PAUSE(), WT_FULL_BARRIER(), etc when s390x is defined
WT-2281 wtperf smoke.sh fails on ppc64le
WT-2282 error in wt_txn_update_oldest verbose message test
WT-2283 retry in txn_update_oldest results in a hang
WT-2285 configure should set BUFFER_ALIGNMENT_DEFAULT to 4kb on linux
WT-2289 failure in fast key check
WT-2290 WT_SESSION.compact could be more effective.
WT-2291 Random cursor walk inefficient in skip list only trees
WT-2297 Fix off-by-one error in Huffman config file parsing
WT-2299 upper-level WiredTiger code is reaching into the block manager
WT-2301 Add reading a range to wtperf
WT-2303 Build warning in wtperf
WT-2304 wtperf crash dumping config
WT-2307 Internal page splits can corrupt cursor iteration
WT-2311 Support Sparc
Branch: master
https://github.com/mongodb/mongo/commit/d845b75e5f0837f801bdf371babd985308a1ad80

Comment by Githook User [ 17/Dec/15 ]

Author:

{u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'}

Message: Merge pull request #2388 from wiredtiger/SERVER-21833-compact

SERVER-21833: fix compaction
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/60cb492a56d67fe637e16f0310486ba987da22d5

Comment by Githook User [ 17/Dec/15 ]

Author:

{u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'}

Message: Merge pull request #2388 from wiredtiger/SERVER-21833-compact

SERVER-21833: fix compaction
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/60cb492a56d67fe637e16f0310486ba987da22d5

Comment by Githook User [ 17/Dec/15 ]

Author:

{u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}

Message: SERVER-21833: Add compaction support for multi-block reconciliations,
they're expected on internal pages in the presence of large caches
(and probably expected in the case of large page sizes as well).
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/5dc3ecf4720ee42478064bbfbee7ced1d96e8315

Comment by Githook User [ 17/Dec/15 ]

Author:

{u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'}

Message: SERVER-21833 Add unit test to test compact bug.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/d3e84b810819f20766f63a237b05146ff7d86991

Comment by Githook User [ 17/Dec/15 ]

Author:

{u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith@wiredtiger.com'}

Message: SERVER-21833: Fix a bug introduced in 769dc59 (March, 2015), when we started
clearing WT_BLOCK.compact_pct_tenths in __wt_block_compact_start. The problem
is __wt_block_compact_skip is the first block compaction routine called, it
it sets WT_BLOCK.compact_pct_tenths and then returns if any compaction is to
be done. The __wt_block_compact_start function is called after that, when the
compaction run is starting. This bug effectively prevents compaction from
ever happening, the size checked in __wt_block_compact_page_skip to decide
if a page should be rewritten or skipped is the file's total size, and every
page is skipped.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/ad33acecd0d537c8adcc8a941f0dfda13541a06c

Comment by Luke Jolly [ 15/Dec/15 ]

Thanks for the updates. Let me know if there's anything I need to do. I've got my backups and mongo cluster in a better start so that I can actually run tests if need be.

Comment by Keith Bostic (Inactive) [ 14/Dec/15 ]

lukejolly, we think we understand this one, so there's no need for additional work on your part.

Comment by Susan LoVerso [ 14/Dec/15 ]

lukejolly I created a WT-specific unit test that models the description you gave above and reproduced the symptom here. We will let you know when there is more information.

Comment by Keith Bostic (Inactive) [ 14/Dec/15 ]

lukejolly, a couple of other possibilities:

  • We could upload a copy of the collection that's failing to compact, and try to compact it ourselves.
  • If you have before/after snapshots of the collection that's not compacting, there are utility commands you could compile and run that would help us understand the failure (but that's likely to take several iterations).
  • We could have you turn on verbose messages for the compaction command; I'm honestly not sure how useful this would be, but it might help us diagnose the problem and it's pretty easy to do.
Comment by Susan LoVerso [ 10/Dec/15 ]

lukejolly Can you create a repro js script that mimics your db and collection and shows this behavior that we could run here? Can it insert a large enough amount of random data, show the stats, delete half, show the stats, compact, show the stats, etc? It may not need to be as large as your original dataset.

Comment by Luke Jolly [ 10/Dec/15 ]

No, the storage size never decreased. Before the deletes it was 61 GB. After deletes and a compaction it was still 61 GB.

Comment by Keith Bostic (Inactive) [ 10/Dec/15 ]

Please correct me if I'm wrong, but it sounds like you deleted the documents and the storage size decreased, but the storage size did not decrease again, after doing a subsequent compact. Is that correct?

If so, then this is expected behavior; the underlying store compacts any time that it can, for example, when documents are deleted. If space can be discarded in the normal course of events, then there may be no additional compaction returned from explicitly attempting to compact the collection.

Comment by Luke Jolly [ 10/Dec/15 ]

To clarify, I have a db with one collection. There was roughly 21,500,000 documents in the collection and db.stats showed it has having a dataSize of 138 GB with storageSize of 61 GB. I then deleted 11,000,000 of the documents (these were also much bigger documents then the ones I left). Now the dataSize is 21 GB but the dataSize has remained the same even after a compact. The disk has 262 GB out of 438 GB free so it definitely has enough to do any moving or rewriting it needs to.

The reason I started doing these deletes was because my backups started failing because they are too big. It appears I do not have a successful backup of before I did the deletes. I can run any commands/utilities you want on the current data set if that will help. I will also be doing a sizable delete on a different db. I will make sure I have a good backup before this.

Comment by Keith Bostic (Inactive) [ 10/Dec/15 ]

Hi, lukejolly, thank you for your report.

Generally, when compaction doesn't have the desired effect is when there's real data in the file for whatever reason.

When you say you "reduced dataSize of a collection", can you tell me a bit more about that process, and how you know the underlying "live" data size was reduced?

Can you easily reproduce this behavior?

The next step in debugging this is probably to run some separate utility commands on the before and after Mongo repositories (since the data is probably a bit too large to upload). Would that be possible?

Generated at Thu Feb 08 03:58:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.