[SERVER-24580] Improve performance when WiredTiger cache is full Created: 15/Jun/16  Updated: 20/Sep/17  Resolved: 05/Jul/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 3.2.8, 3.3.10

Type: Improvement Priority: Critical - P2
Reporter: Michael Cahill (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 5
Labels: code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 18-second-gap.png     PNG File S1646patch.png     PNG File cs31295.png     PNG File incident-06-12-comparison.png     PNG File incident-06-18-server.png     PNG File incident-06-18-waiters.png     PNG File incident-06-18.png     PNG File primary-transition.png     PNG File s1646-2.png     PNG File s1646-stacks.png     PNG File secondary-transition.png     PNG File server-24580-patched-recovery.png     PNG File stalls1-patched.png     PNG File stalls1-unpatched.png     PNG File stalls2-patched.png     PNG File stalls2-unpatched.png    
Issue Links:
Depends
depends on WT-2702 Under high thread load, WiredTiger ex... Closed
Duplicate
is duplicated by SERVER-23001 Occasional 100% cache uses cripples s... Closed
is duplicated by SERVER-24094 Server cache use can take up to an ho... Closed
is duplicated by SERVER-24139 Insert speed decrease rapidly Closed
is duplicated by SERVER-24983 Remove method really slow Closed
Related
Backwards Compatibility: Fully Compatible
Backport Completed:
Participants:
Case:

 Description   

When cache utilization hits 95% performance falls off a cliff, severely impacting production.

If the solution to this isn't to (gently) keep utilization from hitting 95%, then do we need to look at why threads getting involved in evictions at 95% is so impactful? Note that on in the incident on the primary that bruce.lucas analyzed it appeared to me that the shortfall between evictions required to keep the cache steady and actual evictions was only 0.5%, yet the impact on operation rates and latencies to get application threads involved in evictions seemed far out of proportion to the shortfall that they had to make up.

If on the other hand evictions are really so fundamentally difficult that increasing eviction rate by 0.5% is hard, does it make sense to look at it from the other end, throttling application threads by the 0.5% required (in this example) to make up the shortfall by very slightly reducing rate of pages read into cache? A similar analysis of the lag incident on the secondary showed that the shortfall was about 9%, yet making up that shortfall when the cache hit 95% utilization nearly brings replication to a halt for extended periods.



 Comments   
Comment by Michael Cahill (Inactive) [ 15/Aug/16 ]

kimmy-github, this ticket resolved issues in a particular set of workloads. Please open a new ticket and upload the contents of the diagnostic.data directory and mongod.log covering the incident so that we can determine what is going on in your case.

Comment by kimmy-github [ 15/Aug/16 ]

I found there is still having problems at Mongo3.2.8 when the applications run with a large number of updates;
here is the sence:
there is one db,called recommendation,100 tables in the db;It's 4.6g and 1390000 docments for one table;
When cache utilization hits 96% performance falls off a cliff, severely impacting production;it's still exists;Perhaps it's getting better when cache falls below 96%,but a time later,the cache hits 95% again,even hit 100%,the performance will worse;

Comment by Githook User [ 28/Jul/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.8.0-592-g848e5f5.tar.gz from wiredtiger branch mongodb-3.2

ref: 8b7110b..848e5f5
for: 3.2.9

This commit replaces a number of previous backports with the original
change, so some of the tickets listed below may have been resolved in
previous versions already as a backport to the v3.2 branch. For example,
SERVER-24580 includes change a63e21b8 as a cherry-pick of 1f4aaa44, but
this commit replaces that change with 1f4aaa44 itself.

SERVER-23659 Provide useful message when wiredTigerJournalCompressor is changed
SERVER-23661 $sample takes disproportionately long time on newly created collection
SERVER-24306 40-second journaling stall from "log files prepared" to checkpoint
SERVER-24580 Improve performance when WiredTiger cache is full
WT-2103 Add incremental backup testing to format
WT-2223 Add stress testing for in-memory
WT-2268 JSON load incorrect with UNICODE input
WT-2319 Add statistics around fsync calls
WT-2325 Fix an incomplete comment
WT-2343 Assert we don't remove or rename when backup cursor is open
WT-2349 Add ability to open databases read-only
WT-2359 WiredTiger with Python will hang if a calloc failure occurs during __wt_connection_close
WT-2360 Allow disjunctions and combinations of operations in join cursors
WT-2408 Windows error translation layer
WT-2446 Estimate WT cache hit ratio
WT-2450 Salvage releases pages, then explicitly evicts them.
WT-2453 Throughput drop in wtperf evict Jenkins tests
WT-2479 Dump utility discards table config (JSON)
WT-2491 The dhandle close_lock isn't valuable at the moment
WT-2504 Should READONLY always read basecfg file?
WT-2505 Review clang analyzer warnings
WT-2508 Test programs should remove test directories on the "clean" target
WT-2514 Log path name is an empty string.
WT-2518 LSM checkpoint handle acquisition optimization
WT-2520 WT_SESSION::verify should not alter tables
WT-2526 Mixing and matching readonly and read/write handles
WT-2535 Extend test/format to test for transactions reading their writes
WT-2537 Cannot open DB written by WT2.6.1 with WT2.8.0 due to WT_NOTFOUND on recovery
WT-2539 Implement file streaming above pluggable filesystems
WT-2540 Separate stream and file handle methods
WT-2541 Add statistics for number of threads currently in read/write
WT-2542 Fixed-length column store reconciliation overwrites original values
WT-2544 Fix eviction statistics when clear is configured
WT-2546 Eviction server not help evict pages sometimes
WT-2547 Add 1-eviction-worker jobs to Jenkins
WT-2548 Cap the amount of data handed to raw compression.
WT-2549 joins using recno keys return no values
WT-2550 java ex_schema example fails
WT-2552 Public API for pluggable filesystems
WT-2553 Document in-memory configuration and WT_CACHE_FULL error return
WT-2554 Implement a framework for adding C test cases
WT-2556 Typo in the Java example code
WT-2557 format test program should discard log files after incremental backup
WT-2558 WT_PAGE structure reorganization
WT-2559 Windows segfault in logging code
WT-2560 Stuck trying to update oldest transaction ID
WT-2562 Reconfig02 test failing sometimes on PPC
WT-2565 Item 3573 on page at [write-check] is a corrupted cell
WT-2566 All lock operations should be barriers
WT-2567 Segfault in test/format log truncate
WT-2568 Java PackTest.java compilation error
WT-2569 Win_handle_read should always call GetLastError on error
WT-2570 Minor lint cleanups.
WT-2571 Join code cleanup
WT-2572 Don't select an in-memory format run if incompatible options configured
WT-2573 Free of stack-allocated WT_REF
WT-2574 Format doesn't free all allocated configure memory
WT-2576 Variable-length column-store out-of-order return
WT-2577 Core dump discarding non-existent addresses
WT-2579 In-memory configurations break debugging support
WT-2580 Potential SWIG naming conflict in Java
WT-2581 Assert multi->disk_image == NULL
WT-2582 Cache eviction server error: WT_RESTART
WT-2583 Incremental backup can prevent future recovery
WT-2584 Don't use periods in error messages
WT-2586 Remove ex_config.c until config cursors are supported
WT-2589 Check stats using WT_STAT_SET and 'clear' usage
WT-2592 Joins using non-recno key types not working
WT-2593 Disk full with pre-allocated log files
WT-2595 Fix compiler warning in packing tests
WT-2597 JSON dump needs LSM tests
WT-2598 In-memory FS needs fast lookup on file names
WT-2599 Split out the checksum code from the support directory
WT-2600 Clean up test program #includes
WT-2602 LSM stress hangs with very large uncompressed pages
WT-2605 C tests focused on different schema types needed for join and other APIs
WT-2609 Incorrect "skips API_END call" error.
WT-2610 Investigate eviction hazard pointer check
WT-2611 wtperf and/or WT_CONFIG_PARSER do not allow escaped double quotes
WT-2612 The dist/s_prototypes script is creating a debugging file xxx.
WT-2613 Windows build failing with a C4100 error
WT-2615 Enabling checkpoints in test/format leads to reduced concurrency
WT-2616 In-memory deadlock getting size
WT-2617 Bug in pluggable file system example
WT-2621 WiredTiger fails to compile on MSVC 2013
WT-2622 Unit test failed in test_cursor_random.test_cursor_random.test_cursor_random_multiple_page_records
WT-2624 snprintf unsupported on MSVC 2013
WT-2626 snprintf errors with macro redefinition on MSVC2015
WT-2627 Coverity complaints
WT-2628 Reconciliation can return without unlocking the page lock
WT-2629 Introduction of ppc64le crc32c assembly file has made the stack executable
WT-2630 Rename WT_FSTREAM methods to avoid using C99 reserved names
WT-2632 Test format failures due to "checkpoints cannot be dropped when in-use"
WT-2637 The file-extension documentation doesn't cover not-supported cases
WT-2644 'wt load -r' (rename) fails with LSM
WT-2646 Split the lock_wait flag into two, adding a checkpoint_wait flag
WT-2651 Coverity 1355591 resource leak
WT-2652 Logging code improvements
WT-2653 The custom file-system example should show device configuration
WT-2656 Builds failing on GCC 4.7 builder
WT-2658 Only include PPC-specific files in PPC builds
WT-2659 csuite tests, assorted lint and cleanup.
WT-2660 Hang between eviction and connection close
WT-2661 Coverity failures: 1356050-1356053
WT-2662 For internal spell checking, strip out double quote literals, they confuse aspell
WT-2664 Change eviction so any eviction thread can find candidates
WT-2667 Enhance WiredTiger Evergreen testing
WT-2668 Create join statistics that are useful and are easy to understand
WT-2671 Dump more information about the file layout in verify debug mode
WT-2672 Handle system calls that don't set errno
WT-2673 Stop automatically increasing memory page max
WT-2674 Simplify metadata file check
WT-2676 Don't use key size in column store in-memory splits.
WT-2677 Fix JSON output so only printable ASCII is produced (seen on Solaris)
WT-2682 Add option to configure WiredTiger with strict compiler flags
WT-2683 WiredTiger no longer needs to return non-zero disk sizes
WT-2685 Hazard pointer failure from clear walk
WT-2686 Logging subsystem core dump
WT-2687 Test suite should verify the exit status of the wt utility
WT-2689 Use after free in WT_SESSION::open_cursor
WT-2691 Use wrappers for ctype functions to avoid sign extension errors
WT-2692 Fix race in file system example
WT-2696 Race condition on unclean shutdown may miss log records with large updates
WT-2698 Test/recovery hung
WT-2702 Under high thread load, WiredTiger exceeds cache size
WT-2704 Test/format hung on bengal
WT-2706 Race condition on log file switch can cause missing log records
WT-2707 dist/s_label enhancements, and error jump cleanups
WT-2709 Connection reconfigure segfault in __wt_conn_cache_pool_destroy
WT-2710 WT_FILE_HANDLE_INMEM no longer needs an off field
WT-2712 Coverity 1356928 and 1356929: ASSERT_SIDE_EFFECT
WT-2713 Document WT_PANIC so pluggable filesystems can panic.
WT-2714 Lint
WT-2715 random-abort test may write partial record at the end
WT-2720 Pull request tester not running Python suite
WT-2722 s_label or s_label_loop false positive
WT-2724 Eviction workers created on open exit immediately
WT-2763 Unit test test_intpack failing on OSX
Branch: v3.2
https://github.com/mongodb/mongo/commit/385faa6ca72ad599d92d3948f1a2033d908e0226

Comment by Githook User [ 28/Jul/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.8.0-592-g848e5f5.tar.gz from wiredtiger branch mongodb-3.2

ref: 8b7110b..848e5f5
for: 3.2.9

This commit replaces a number of previous backports with the original
change, so some of the tickets listed below may have been resolved in
previous versions already as a backport to the v3.2 branch. For example,
SERVER-24580 includes change a63e21b8 as a cherry-pick of 1f4aaa44, but
this commit replaces that change with 1f4aaa44 itself.

SERVER-23659 Provide useful message when wiredTigerJournalCompressor is changed
SERVER-23661 $sample takes disproportionately long time on newly created collection
SERVER-24306 40-second journaling stall from "log files prepared" to checkpoint
SERVER-24580 Improve performance when WiredTiger cache is full
WT-2103 Add incremental backup testing to format
WT-2223 Add stress testing for in-memory
WT-2268 JSON load incorrect with UNICODE input
WT-2319 Add statistics around fsync calls
WT-2325 Fix an incomplete comment
WT-2343 Assert we don't remove or rename when backup cursor is open
WT-2349 Add ability to open databases read-only
WT-2359 WiredTiger with Python will hang if a calloc failure occurs during __wt_connection_close
WT-2360 Allow disjunctions and combinations of operations in join cursors
WT-2408 Windows error translation layer
WT-2446 Estimate WT cache hit ratio
WT-2450 Salvage releases pages, then explicitly evicts them.
WT-2453 Throughput drop in wtperf evict Jenkins tests
WT-2479 Dump utility discards table config (JSON)
WT-2491 The dhandle close_lock isn't valuable at the moment
WT-2504 Should READONLY always read basecfg file?
WT-2505 Review clang analyzer warnings
WT-2508 Test programs should remove test directories on the "clean" target
WT-2514 Log path name is an empty string.
WT-2518 LSM checkpoint handle acquisition optimization
WT-2520 WT_SESSION::verify should not alter tables
WT-2526 Mixing and matching readonly and read/write handles
WT-2535 Extend test/format to test for transactions reading their writes
WT-2537 Cannot open DB written by WT2.6.1 with WT2.8.0 due to WT_NOTFOUND on recovery
WT-2539 Implement file streaming above pluggable filesystems
WT-2540 Separate stream and file handle methods
WT-2541 Add statistics for number of threads currently in read/write
WT-2542 Fixed-length column store reconciliation overwrites original values
WT-2544 Fix eviction statistics when clear is configured
WT-2546 Eviction server not help evict pages sometimes
WT-2547 Add 1-eviction-worker jobs to Jenkins
WT-2548 Cap the amount of data handed to raw compression.
WT-2549 joins using recno keys return no values
WT-2550 java ex_schema example fails
WT-2552 Public API for pluggable filesystems
WT-2553 Document in-memory configuration and WT_CACHE_FULL error return
WT-2554 Implement a framework for adding C test cases
WT-2556 Typo in the Java example code
WT-2557 format test program should discard log files after incremental backup
WT-2558 WT_PAGE structure reorganization
WT-2559 Windows segfault in logging code
WT-2560 Stuck trying to update oldest transaction ID
WT-2562 Reconfig02 test failing sometimes on PPC
WT-2565 Item 3573 on page at [write-check] is a corrupted cell
WT-2566 All lock operations should be barriers
WT-2567 Segfault in test/format log truncate
WT-2568 Java PackTest.java compilation error
WT-2569 Win_handle_read should always call GetLastError on error
WT-2570 Minor lint cleanups.
WT-2571 Join code cleanup
WT-2572 Don't select an in-memory format run if incompatible options configured
WT-2573 Free of stack-allocated WT_REF
WT-2574 Format doesn't free all allocated configure memory
WT-2576 Variable-length column-store out-of-order return
WT-2577 Core dump discarding non-existent addresses
WT-2579 In-memory configurations break debugging support
WT-2580 Potential SWIG naming conflict in Java
WT-2581 Assert multi->disk_image == NULL
WT-2582 Cache eviction server error: WT_RESTART
WT-2583 Incremental backup can prevent future recovery
WT-2584 Don't use periods in error messages
WT-2586 Remove ex_config.c until config cursors are supported
WT-2589 Check stats using WT_STAT_SET and 'clear' usage
WT-2592 Joins using non-recno key types not working
WT-2593 Disk full with pre-allocated log files
WT-2595 Fix compiler warning in packing tests
WT-2597 JSON dump needs LSM tests
WT-2598 In-memory FS needs fast lookup on file names
WT-2599 Split out the checksum code from the support directory
WT-2600 Clean up test program #includes
WT-2602 LSM stress hangs with very large uncompressed pages
WT-2605 C tests focused on different schema types needed for join and other APIs
WT-2609 Incorrect "skips API_END call" error.
WT-2610 Investigate eviction hazard pointer check
WT-2611 wtperf and/or WT_CONFIG_PARSER do not allow escaped double quotes
WT-2612 The dist/s_prototypes script is creating a debugging file xxx.
WT-2613 Windows build failing with a C4100 error
WT-2615 Enabling checkpoints in test/format leads to reduced concurrency
WT-2616 In-memory deadlock getting size
WT-2617 Bug in pluggable file system example
WT-2621 WiredTiger fails to compile on MSVC 2013
WT-2622 Unit test failed in test_cursor_random.test_cursor_random.test_cursor_random_multiple_page_records
WT-2624 snprintf unsupported on MSVC 2013
WT-2626 snprintf errors with macro redefinition on MSVC2015
WT-2627 Coverity complaints
WT-2628 Reconciliation can return without unlocking the page lock
WT-2629 Introduction of ppc64le crc32c assembly file has made the stack executable
WT-2630 Rename WT_FSTREAM methods to avoid using C99 reserved names
WT-2632 Test format failures due to "checkpoints cannot be dropped when in-use"
WT-2637 The file-extension documentation doesn't cover not-supported cases
WT-2644 'wt load -r' (rename) fails with LSM
WT-2646 Split the lock_wait flag into two, adding a checkpoint_wait flag
WT-2651 Coverity 1355591 resource leak
WT-2652 Logging code improvements
WT-2653 The custom file-system example should show device configuration
WT-2656 Builds failing on GCC 4.7 builder
WT-2658 Only include PPC-specific files in PPC builds
WT-2659 csuite tests, assorted lint and cleanup.
WT-2660 Hang between eviction and connection close
WT-2661 Coverity failures: 1356050-1356053
WT-2662 For internal spell checking, strip out double quote literals, they confuse aspell
WT-2664 Change eviction so any eviction thread can find candidates
WT-2667 Enhance WiredTiger Evergreen testing
WT-2668 Create join statistics that are useful and are easy to understand
WT-2671 Dump more information about the file layout in verify debug mode
WT-2672 Handle system calls that don't set errno
WT-2673 Stop automatically increasing memory page max
WT-2674 Simplify metadata file check
WT-2676 Don't use key size in column store in-memory splits.
WT-2677 Fix JSON output so only printable ASCII is produced (seen on Solaris)
WT-2682 Add option to configure WiredTiger with strict compiler flags
WT-2683 WiredTiger no longer needs to return non-zero disk sizes
WT-2685 Hazard pointer failure from clear walk
WT-2686 Logging subsystem core dump
WT-2687 Test suite should verify the exit status of the wt utility
WT-2689 Use after free in WT_SESSION::open_cursor
WT-2691 Use wrappers for ctype functions to avoid sign extension errors
WT-2692 Fix race in file system example
WT-2696 Race condition on unclean shutdown may miss log records with large updates
WT-2698 Test/recovery hung
WT-2702 Under high thread load, WiredTiger exceeds cache size
WT-2704 Test/format hung on bengal
WT-2706 Race condition on log file switch can cause missing log records
WT-2707 dist/s_label enhancements, and error jump cleanups
WT-2709 Connection reconfigure segfault in __wt_conn_cache_pool_destroy
WT-2710 WT_FILE_HANDLE_INMEM no longer needs an off field
WT-2712 Coverity 1356928 and 1356929: ASSERT_SIDE_EFFECT
WT-2713 Document WT_PANIC so pluggable filesystems can panic.
WT-2714 Lint
WT-2715 random-abort test may write partial record at the end
WT-2720 Pull request tester not running Python suite
WT-2722 s_label or s_label_loop false positive
WT-2724 Eviction workers created on open exit immediately
WT-2763 Unit test test_intpack failing on OSX
Branch: v3.2
https://github.com/mongodb/mongo/commit/385faa6ca72ad599d92d3948f1a2033d908e0226

Comment by Githook User [ 27/Jul/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.8.0-592-g848e5f5.tar.gz from wiredtiger branch mongodb-3.2

ref: 8b7110b..848e5f5
for: 3.2.9

This commit replaces a number of previous backports with the original
change, so some of the tickets listed below may have been resolved in
previous versions already as a backport to the v3.2 branch. For example,
SERVER-24580 includes change a63e21b8 as a cherry-pick of 1f4aaa44, but
this commit replaces that change with 1f4aaa44 itself.

SERVER-23659 Provide useful message when wiredTigerJournalCompressor is changed
SERVER-23661 $sample takes disproportionately long time on newly created collection
SERVER-24306 40-second journaling stall from "log files prepared" to checkpoint
SERVER-24580 Improve performance when WiredTiger cache is full
WT-2103 Add incremental backup testing to format
WT-2223 Add stress testing for in-memory
WT-2268 JSON load incorrect with UNICODE input
WT-2319 Add statistics around fsync calls
WT-2325 Fix an incomplete comment
WT-2343 Assert we don't remove or rename when backup cursor is open
WT-2349 Add ability to open databases read-only
WT-2359 WiredTiger with Python will hang if a calloc failure occurs during __wt_connection_close
WT-2360 Allow disjunctions and combinations of operations in join cursors
WT-2408 Windows error translation layer
WT-2446 Estimate WT cache hit ratio
WT-2450 Salvage releases pages, then explicitly evicts them.
WT-2453 Throughput drop in wtperf evict Jenkins tests
WT-2479 Dump utility discards table config (JSON)
WT-2491 The dhandle close_lock isn't valuable at the moment
WT-2504 Should READONLY always read basecfg file?
WT-2505 Review clang analyzer warnings
WT-2508 Test programs should remove test directories on the "clean" target
WT-2514 Log path name is an empty string.
WT-2518 LSM checkpoint handle acquisition optimization
WT-2520 WT_SESSION::verify should not alter tables
WT-2526 Mixing and matching readonly and read/write handles
WT-2535 Extend test/format to test for transactions reading their writes
WT-2537 Cannot open DB written by WT2.6.1 with WT2.8.0 due to WT_NOTFOUND on recovery
WT-2539 Implement file streaming above pluggable filesystems
WT-2540 Separate stream and file handle methods
WT-2541 Add statistics for number of threads currently in read/write
WT-2542 Fixed-length column store reconciliation overwrites original values
WT-2544 Fix eviction statistics when clear is configured
WT-2546 Eviction server not help evict pages sometimes
WT-2547 Add 1-eviction-worker jobs to Jenkins
WT-2548 Cap the amount of data handed to raw compression.
WT-2549 joins using recno keys return no values
WT-2550 java ex_schema example fails
WT-2552 Public API for pluggable filesystems
WT-2553 Document in-memory configuration and WT_CACHE_FULL error return
WT-2554 Implement a framework for adding C test cases
WT-2556 Typo in the Java example code
WT-2557 format test program should discard log files after incremental backup
WT-2558 WT_PAGE structure reorganization
WT-2559 Windows segfault in logging code
WT-2560 Stuck trying to update oldest transaction ID
WT-2562 Reconfig02 test failing sometimes on PPC
WT-2565 Item 3573 on page at [write-check] is a corrupted cell
WT-2566 All lock operations should be barriers
WT-2567 Segfault in test/format log truncate
WT-2568 Java PackTest.java compilation error
WT-2569 Win_handle_read should always call GetLastError on error
WT-2570 Minor lint cleanups.
WT-2571 Join code cleanup
WT-2572 Don't select an in-memory format run if incompatible options configured
WT-2573 Free of stack-allocated WT_REF
WT-2574 Format doesn't free all allocated configure memory
WT-2576 Variable-length column-store out-of-order return
WT-2577 Core dump discarding non-existent addresses
WT-2579 In-memory configurations break debugging support
WT-2580 Potential SWIG naming conflict in Java
WT-2581 Assert multi->disk_image == NULL
WT-2582 Cache eviction server error: WT_RESTART
WT-2583 Incremental backup can prevent future recovery
WT-2584 Don't use periods in error messages
WT-2586 Remove ex_config.c until config cursors are supported
WT-2589 Check stats using WT_STAT_SET and 'clear' usage
WT-2592 Joins using non-recno key types not working
WT-2593 Disk full with pre-allocated log files
WT-2595 Fix compiler warning in packing tests
WT-2597 JSON dump needs LSM tests
WT-2598 In-memory FS needs fast lookup on file names
WT-2599 Split out the checksum code from the support directory
WT-2600 Clean up test program #includes
WT-2602 LSM stress hangs with very large uncompressed pages
WT-2605 C tests focused on different schema types needed for join and other APIs
WT-2609 Incorrect "skips API_END call" error.
WT-2610 Investigate eviction hazard pointer check
WT-2611 wtperf and/or WT_CONFIG_PARSER do not allow escaped double quotes
WT-2612 The dist/s_prototypes script is creating a debugging file xxx.
WT-2613 Windows build failing with a C4100 error
WT-2615 Enabling checkpoints in test/format leads to reduced concurrency
WT-2616 In-memory deadlock getting size
WT-2617 Bug in pluggable file system example
WT-2621 WiredTiger fails to compile on MSVC 2013
WT-2622 Unit test failed in test_cursor_random.test_cursor_random.test_cursor_random_multiple_page_records
WT-2624 snprintf unsupported on MSVC 2013
WT-2626 snprintf errors with macro redefinition on MSVC2015
WT-2627 Coverity complaints
WT-2628 Reconciliation can return without unlocking the page lock
WT-2629 Introduction of ppc64le crc32c assembly file has made the stack executable
WT-2630 Rename WT_FSTREAM methods to avoid using C99 reserved names
WT-2632 Test format failures due to "checkpoints cannot be dropped when in-use"
WT-2637 The file-extension documentation doesn't cover not-supported cases
WT-2644 'wt load -r' (rename) fails with LSM
WT-2646 Split the lock_wait flag into two, adding a checkpoint_wait flag
WT-2651 Coverity 1355591 resource leak
WT-2652 Logging code improvements
WT-2653 The custom file-system example should show device configuration
WT-2656 Builds failing on GCC 4.7 builder
WT-2658 Only include PPC-specific files in PPC builds
WT-2659 csuite tests, assorted lint and cleanup.
WT-2660 Hang between eviction and connection close
WT-2661 Coverity failures: 1356050-1356053
WT-2662 For internal spell checking, strip out double quote literals, they confuse aspell
WT-2664 Change eviction so any eviction thread can find candidates
WT-2667 Enhance WiredTiger Evergreen testing
WT-2668 Create join statistics that are useful and are easy to understand
WT-2671 Dump more information about the file layout in verify debug mode
WT-2672 Handle system calls that don't set errno
WT-2673 Stop automatically increasing memory page max
WT-2674 Simplify metadata file check
WT-2676 Don't use key size in column store in-memory splits.
WT-2677 Fix JSON output so only printable ASCII is produced (seen on Solaris)
WT-2682 Add option to configure WiredTiger with strict compiler flags
WT-2683 WiredTiger no longer needs to return non-zero disk sizes
WT-2685 Hazard pointer failure from clear walk
WT-2686 Logging subsystem core dump
WT-2687 Test suite should verify the exit status of the wt utility
WT-2689 Use after free in WT_SESSION::open_cursor
WT-2691 Use wrappers for ctype functions to avoid sign extension errors
WT-2692 Fix race in file system example
WT-2696 Race condition on unclean shutdown may miss log records with large updates
WT-2698 Test/recovery hung
WT-2702 Under high thread load, WiredTiger exceeds cache size
WT-2704 Test/format hung on bengal
WT-2706 Race condition on log file switch can cause missing log records
WT-2707 dist/s_label enhancements, and error jump cleanups
WT-2709 Connection reconfigure segfault in __wt_conn_cache_pool_destroy
WT-2710 WT_FILE_HANDLE_INMEM no longer needs an off field
WT-2712 Coverity 1356928 and 1356929: ASSERT_SIDE_EFFECT
WT-2713 Document WT_PANIC so pluggable filesystems can panic.
WT-2714 Lint
WT-2715 random-abort test may write partial record at the end
WT-2720 Pull request tester not running Python suite
WT-2722 s_label or s_label_loop false positive
WT-2724 Eviction workers created on open exit immediately
WT-2763 Unit test test_intpack failing on OSX
Branch: v3.2
https://github.com/mongodb/mongo/commit/385faa6ca72ad599d92d3948f1a2033d908e0226

Comment by Ramon Fernandez Marina [ 23/Jul/16 ]

This is great news gheppner! Thanks for letting us know, and very glad to hear MongoDB 3.2.8 is working well for you.

Regards,
Ramón.

Comment by George Heppner [ 22/Jul/16 ]

Just to close the loop on this - since upgrading to 3.2.8 in Production on Monday, we've seen the same positive results as in our Dev environment. All nodes have stayed below 100% wiredTiger cache utilization except for the occasional short bursts, and the overall process size is considerably lower than before. We also saw the same dramatic increase in cache eviction rate - but again, in our case I would expect this, because our working set does not fit in ram. At least now it seems better able to cope with that fact, and we can compensate for the higher cache eviction rate by provisioning more I/O on our EBS volumes.

Comment by Ramon Fernandez Marina [ 15/Jul/16 ]

Thanks for the update gheppner, it's very useful to see that the work done in this ticket is helping your use case. Please keep us posted on the results in your production environment, and feel free to open a new SERVER ticket if the problem persists.

Regards,
Ramón.

Comment by George Heppner [ 15/Jul/16 ]

I just rolled this out yesterday in our Dev environment, and the results so far are encouraging. I'm seeing a far higher cache eviction rate than before, far lower overall mongodb process vsize and so far, have not observed the cache staying pinned at 100% on any nodes. I'm interpreting the higher eviction rate + lower memory use to suggest mongo is doing a better job of cleaning up after itself now.

I plan to deploy this to production on Monday where this issue is far more prevalent. But so far - looking good.

Comment by Ramon Fernandez Marina [ 13/Jul/16 ]

sallgeud, can you please open a new ticket to track the issue you describe? It will help to keep things clear and make sure we're looking at the right thing.

The first thing we'll ask you is to upload logs and the contents of the diagnostic.data directory for the affected node/primary.

Thanks,
Ramón.

Comment by Chad Kreimendahl [ 13/Jul/16 ]

It doesn't appear that this was entirely fixed. When our cache gets full we get extreme latency on even the most basic requests. Notably, we have a monitor that runs, and while we're under the cache limit, we get sub-millisecond latency calling db.runCommand(

{ping: 1}

). However, the moment we hit the cache limit, we get sporadic responses between 5ms and 20000ms (.02 seconds and 20 seconds).

While the random delays we get come in long spurts, they are less than in 3.2.7. However, we never had this issue with 3.2.6. I'm wondering if there's something in the last two point releases.

I'm happy to help on this as much as I can (webex, configs, logs, etc)

Comment by Ramon Fernandez Marina [ 08/Jul/16 ]

All,

MongoDB release candidate 3.2.8-rc1 is now available for download, and includes a fix for this issue. The MongoDB 3.2.8 release is scheduled for next week.

Regards,
Ramón.

Comment by Githook User [ 28/Jun/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.8.0-219-gf4954f6.tar.gz from wiredtiger branch mongodb-3.2

ref: a6a64e9..f4954f6

SERVER-24580 Performance is poor when WiredTiger cache is full
WT-2672 Handle system calls that don't set errno
WT-2696 Missing log records with large updates
WT-2702 Under high thread load, WiredTiger exceeds cache size
WT-2708 split child-update race with reconciliation/eviction
WT-2729 Focus eviction walks in largest trees
Branch: v3.2
https://github.com/mongodb/mongo/commit/e8dc6b98c1c91727f7def84f2fb4b57bf67ccc88

Comment by Githook User [ 24/Jun/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.8.0-511-gd8fb874.tar.gz from wiredtiger branch mongodb-3.4

ref: fb1663e..d8fb874

SERVER-24580 Performance is poor when WiredTiger cache is full
WT-2319 Add statistics around fsync calls
WT-2408 Windows error translation layer
WT-2514 log path name is an empty string.
WT-2541 Add statistics for number of threads currently in read/write
WT-2605 C tests focused on different schema types needed for join and other APIs
WT-2622 unit test failed in test_cursor_random.test_cursor_random.test_cursor_random_multiple_page_records
WT-2652 Logging code improvements
WT-2664 Change eviction so any eviction thread can find candidates
WT-2667 Enhance WiredTiger Evergreen testing
WT-2668 Create join statistics that are useful and are easy to understand
WT-2671 dump more information about the file layout in verify debug mode
WT-2672 Handle system calls that don't set errno
WT-2676 Don't use key size in column store in-memory splits.
WT-2677 Fix JSON output so only printable ASCII is produced (seen on Solaris)
WT-2682 add option to configure WiredTiger with strict compiler flags
WT-2683 WiredTiger no longer needs to return non-zero disk sizes
WT-2685 hazard pointer failure from clear walk
WT-2686 logging subsystem core dump
WT-2687 test suite should verify the exit status of the wt utility
WT-2689 Use after free in WT_SESSION::open_cursor
WT-2691 Use wrappers for ctype functions to avoid sign extension errors
WT-2692 Fix race in file system example
WT-2696 Missing log records with large updates
WT-2698 Test/recovery hung
WT-2702 Under high thread load, WiredTiger exceeds cache size
WT-2704 test/format hung on bengal
WT-2706 Test failure running random-abort recovery test
WT-2707 dist/s_label enhancements, and error jump cleanups
WT-2709 connection reconfigure segfault in __wt_conn_cache_pool_destroy
WT-2710 WT_FILE_HANDLE_INMEM no longer needs an off field
WT-2712 Coverity 1356928 and 1356929: ASSERT_SIDE_EFFECT
WT-2713 document WT_PANIC so pluggable filesystems can panic.
WT-2714 lint
WT-2715 random-abort test may write partial record at the end
WT-2720 Pull request tester not running Python suite
WT-2722 s_label or s_label_loop false positive
WT-2724 Eviction workers created on open exit immediately
SERVER-24425 WiredTiger changes for MongoDB 3.3.9
Branch: master
https://github.com/mongodb/mongo/commit/7e3ce9cec306df5abb15c0c1a599023270cfb807

Comment by Bruce Lucas (Inactive) [ 23/Jun/16 ]

Here's another run from the same application as the previous comment using the same patched build:

This one seems to show more clearly that pages queued for eviction declines at A, coinciding with stalls of 2-3 seconds in the ftdc thread ("seconds per sample").

Stack traces show that the FTDC thread is spending a lot of time waiting for pages to evict; in particular no evidence in these stack traces that actually evicting pages is the bottleck:

Comment by Bruce Lucas (Inactive) [ 23/Jun/16 ]

Some results using the fixes mentioned above:

  • 95% cache B-C and D-E
  • behavior seems somewhat improved: we no longer see extensive 18-second ftdc stalls as before on this workload, but we do still see 2-3 second stalls ("ftdc seconds per sample")
  • mongod log does show some much longer latency ops, up to 38 seconds ("max logged op (ms)"), but those seem to be operations that were doing scans and yielded multiple times, so probably are showing an accumulation of multiple 2-3 second stalls, as each yield starts a new transaction at which point the operation may stall doing evictions
  • some evidence that perhaps the eviction server is still having some trouble identifying pages to evict: "pages queued for eviction" declines at about D where the worst of the problems start
  • application threads are evicting modified pages, possibly contributing to the issue. The proportion of modified pages being evicted seems perhaps surprising given the tiny portion of the cache that is dirty.
Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Add more eviction stats to track efficiency. (#2830)

(cherry picked from commit 1f4aaa4490a82cf947afdabbb9214ee5b1850d13)
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/a63e21b838548b9ee8c66136c1c53f922f06c8ac

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Add more eviction stats to track efficiency. (#2830)
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/1f4aaa4490a82cf947afdabbb9214ee5b1850d13

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Fix backport.
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/9ee39b8aea2812efd1d07a5d818ed27a105c6fbe

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Update oldest txn ID with "strict, nowait" flags. (#2829)

Add more options for callers when updating the oldest ID to control how much they care about the ID being updated.
(cherry picked from commit 116e41e5e1960db6855c4202a971107d5aff642d)
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/d68800d0e9b95f2ba83d68ff7d3291603fea7b16

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Enhance eviction when application threads are contributing (#2806)

When the cache hits eviction triggers, all application threads can
hammer the eviction queue lock, starving each other and server threads.

Also, noticed with the same workload, the eviction server doesn't need
to force updates to the oldest ID (which can starve the eviction server
thread if there are hundreds of application threads getting snapshots).
It is sufficient to update it lazily.

  • Clear the eviction walk if we don't find any candidates.

Otherwise, we are keeping a page pinned in what might be an idle file,
and tying up a hazard pointer that could prevent eviction from an active
file (since the eviction server tracks how many hazard pointers it is
using to avoid going over the limit).

(cherry picked from commit 7f9d7aecea0fade135e7c79cb3de9722b0287a79)
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/ded2149b2c7b3cd19603bb88beac91f6d96c3f27

Comment by Githook User [ 23/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Update oldest txn ID with "strict, nowait" flags. (#2829)

Add more options for callers when updating the oldest ID to control how much they care about the ID being updated.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/116e41e5e1960db6855c4202a971107d5aff642d

Comment by Githook User [ 22/Jun/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-24580 Enhance eviction when application threads are contributing (#2806)

When the cache hits eviction triggers, all application threads can
hammer the eviction queue lock, starving each other and server threads.

Also, noticed with the same workload, the eviction server doesn't need
to force updates to the oldest ID (which can starve the eviction server
thread if there are hundreds of application threads getting snapshots).
It is sufficient to update it lazily.

  • Clear the eviction walk if we don't find any candidates.

Otherwise, we are keeping a page pinned in what might be an idle file,
and tying up a hazard pointer that could prevent eviction from an active
file (since the eviction server tracks how many hazard pointers it is
using to avoid going over the limit).
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/7f9d7aecea0fade135e7c79cb3de9722b0287a79

Comment by Bruce Lucas (Inactive) [ 18/Jun/16 ]

Here's a customer incident:

  • starting at point D application threads are getting involved in eviction
  • we see corresponding queuing in WT ("threads" is concurrentTransactions read + write)
  • there are gaps of up to 6 seconds in ftdc, indicating very high operation latencies
  • rates of pages walked per second remains high, and there are never more than 16 threads in WT, so this doesn't appear to be an issue of contention due to high thread count
  • rate of application thread evictions remains low, and ratio of pages walked per page evicted is high

Here are stack traces; the graphs show number of threads at that call site, and cover approximately the same period as the graph above. First is a reverse call graph showing the application threads hung up waiting presumably for pages to evict:

  • 1: up to 17 threads in a timed wait in __wt_cache_eviction_worker
  • 2: the FTDC thread is sometimes hung for the same reason, beginning a transaction to get a storage size, presumably for the oplog as noted above

  • __evict_server is consuming 100% of a core, either in __evict_server itself or calling __tree_walk_internal
Comment by Bruce Lucas (Inactive) [ 17/Jun/16 ]

Thanks michael.cahill. In looking at various instances of this issue I do sometimes see the effect you are talking about: a large number of threads queued in WT and evidence of CPU or lock contention in the form of a much reduced rate of pages walked for eviction /s.

However there seems to be another class of incidents where there doesn't seem to be evidence of such contention:

  • the number of threads is low (e.g. 16 or 24)
  • pages walked for eviction remains high (many millions per second), consistent with no contention
  • one customer reports that in the bad state one core is maxed out and the remaining cores are mostly idle, again consistent with no contention, and no activity other than scanning for pages to evict
  • a very low rate of pages evicted in spite of the very high rate of pages walked; in other words, an enormous ratio of pages walked to pages evicted. In one case (detailed elsewhere) we see a stall of 18 seconds during which only a handful of pages were evicted, but pages walked for eviction indicates the cache was scanned many times over.

All of the above suggests to me a possible algorithmic issue in the process of selecting candidates for eviction; what do you think?

Comment by Michael Cahill (Inactive) [ 17/Jun/16 ]

bruce.lucas, I have been attempting to reproduce this behavior today in standalone WiredTiger.

One thing I noticed about some of the cases is that requests kept arriving at about the same rate even as latencies increased (due to application threads doing eviction when the cache reached 95%). That leads to increased queueing, and is different to the behavior of many of our test cases, where a fixed number of threads will slow down the rate of requests if latency increases.

To approximate a fixed rate of arrival, I've been running workloads with 2000 threads over 200 tables (~100 MongoDB collections), where each thread is throttled to 10 operations / second. The server should be able to handle 20K operations / second, but I've seen behavior similar to the tickets linked here, where performance is good while the data is smaller than cache, but degrades significantly when WiredTiger attempts to throttle the workload.

At least part of the issue is that with many threads trying to evict pages, the background threads are starved of CPU and/or spend a long time competing for locks in order to queue additional pages for eviction.

I will push a WT branch with some tentative changes today that causes application threads to wait rather than spinning to access the eviction queue. This change helps with the synthetic workload I have been testing but will need further evaluation on a wide range of workloads before we can know whether it will help in general.

Comment by Michael Cahill (Inactive) [ 17/Jun/16 ]

I'm still working through the various cases here. A quick note on FTDC stalls: in SERVER-20877, we avoid stalls from getting the WiredTiger connection statistics. Unfortunately in a replicaset, another section of serverStatus involves replication statistics, which access the oplog, and that operation does still stall when the cache is at 100%.

Comment by Bruce Lucas (Inactive) [ 15/Jun/16 ]

An incident from a secondary, same cluster as above:

  • at or shortly after cache reaches 95% and application thread evictions begin, severe performance problems start
  • for example from C-D, a period of about 100 seconds, replication appears to be completely or nearly completely stalled, and there are no FTDC samples
  • the stalls are punctuated by periods of activity
  • some possibly interesting data handle stats, but be careful interpreting the graphs: when there are frequent ftdc samples each sweep attempt shows as a spike at 1/s (because the sample interval is 1 s), but when ftdc samples are far apart it is showing the average rate over a long period of time.
Comment by Bruce Lucas (Inactive) [ 15/Jun/16 ]

I have made progress towards a repro, but have not been able to reproduce the severe performance problem seen here: I've come up with workloads that can get the cache to 95%, but do not show the operation stalls, extreme latencies, and FTDC stalls.

I don't see any likelihood that this has anything to do with i/o, given the low i/o rate before reaching 95%, and even lower after 95%.

Doesn't seem to have anything to do with number of threads - that number is quite low here, and we have seen this problem on secondaries where there are only 16 threads.

It may have something to do with number of data handles; at least two of the cases have a quite large number (tens of thousands). However in my repro attempt I went as high as about 10k data handles and didn't see the issue.

I suspect understanding the FTDC stalls would tell us a lot. Short of getting stack traces, are there any guesses we can take about the cause for that that might help us understand or reproduce the problem?

Comment by Bruce Lucas (Inactive) [ 15/Jun/16 ]

Here's an example from a customer:

  • at F application thread evictions begin
  • operations stall or nearly stall for extended periods (many seconds)
  • operations queue in WT
  • very long operation latencies are reported in mongod log ("max logged op (ms) per 1s" shows the longest op reported for each 1 second period), as high as 18779 ms (read from "max" column)
  • ftdc data capture stalls for extended periods (ftdc samples/s). For example from G-H, a period of about 18 seconds, no ftdc samples are obtained, and at the end of that period several long operations of about 18 seconds are reported in mongod log
  • i/o rates (block-manager bytes read/written) is not particularly high, and is much lower after the cache reaches 95%, reflecting the much lower operation rates
  • there were about 25k data handles, rising to 40k during checkpoints.
Comment by Michael Cahill (Inactive) [ 15/Jun/16 ]

bruce.lucas, a few points here:

  • as you have noticed, it isn't easy to construct simple workloads that push the WiredTiger cache usage over 80%. One thing that makes this more likely is if application threads are accessing multiple WiredTiger tables (e.g., updates to indexed fields where the index is larger than cache or index scans where the collection is larger than cache).
  • the default MongoDB configuration is to have 4 background eviction threads: application performance will be best if those threads can keep up with the rate of cache pressure from the application so that no throttling is required. For highly threaded workloads generating significant cache pressure, increasing the "eviction=(thread_max=N)" setting (e.g., to N=6 or N=8) may help. That is likely to be sufficient in the "0.5% rate" case described above.
  • more generally, the current design of WiredTiger is to throttle application threads by having them perform eviction when the background threads can't keep up. That works well in read-only workloads where eviction is cheap, but some of the performance issues noted here may be caused by application threads evicting dirty pages, which can be much more expensive. One option we could try is to have application threads only evict clean pages: that should make this kind of throttling much finer-grained.
  • many applications operate with the cache over 95% full without the performance becoming unacceptable: there may be a link to slow I/O (e.g., due to zlib compression or slow disks).
  • there are improvements in WiredTiger's cache algorithms in development that are not yet in production releases of MongoDB (e.g., WT-2664). Those changes may make it easier for background threads to keep up.
  • we could consider throttling reads more directly: we don't yet have a design for that, but it would need to take into account the differential rates of cache pressure from different files. Another challenge is that we have to be careful about throttling operations once they have resources pinned (e.g., transactional updates in progress, cursors positioned in multiple trees) – adding delays in those cases can increase contention.
Generated at Thu Feb 08 04:06:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.