[SERVER-22831] Low query rate with heavy cache pressure and an idle collection Created: 24/Feb/16  Updated: 19/Nov/17  Resolved: 01/Mar/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.3
Fix Version/s: 3.0.12, 3.2.5, 3.3.3

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Michael Cahill (Inactive)
Resolution: Done Votes: 0
Labels: WTplaybook, code-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File diagnostic.data.tar     PNG File queryrate.png     PNG File two-collections.png    
Issue Links:
Backports
Duplicate
is duplicated by SERVER-23001 Occasional 100% cache uses cripples s... Closed
Related
related to SERVER-22834 1.7x performance regression in random... Closed
related to SERVER-23622 Inconsistent throughput during insert... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Participants:
Case:

 Description   
  • single-node replicat set, 3 GB cache, 20 GB oplog
  • insert 10 M x 1 kB documents (10 GB total size, plus index)
  • then 100 threads querying documents at random, observe low query rate
  • then restart mongod, same queries are now much faster

  • A-B: collection is being created
  • B-C: random queries
    • query rate is very low, ~6 k/s
    • rate of evicting from and reading into cache is ~9 k pages/s, ~1.5 pages per query, so very high miss ratio
    • rate of pages walked for eviction is very high, ~21 M/s, so about 2300 pages walked for every page evicted, or 2-3% of pages in cache walked for every page evicted
    • no pages are being evicted from oplog, but it is uncertain whether that is because all pages have already been evicted
  • C-D: after restart
    • query rate is much higher, ~34 k/s
    • rate of evicting from and reading into cache is 26 k pages/s, ~0.75 pages per query, so lower miss ratio than before restart
    • rate of pages walked for eviction is much lower

The issue does not reproduce on a standalone node.

Possibly related to SERVER-22423?



 Comments   
Comment by Githook User [ 15/Apr/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-mongodb-3.0.9-9-gf6286c2.tar.gz from wiredtiger branch mongodb-3.0

ref: 3dbc6c6..f6286c2

SERVER-22831 Low query rate with heavy cache pressure and an idle collection
SERVER-23457 WiredTiger changes for MongoDB 3.0.12
WT-2157 test/format corrupted cell failure
WT-2361 column-store starting record number error
WT-2451 Allow eviction of metadata
Branch: v3.0
https://github.com/mongodb/mongo/commit/d0324043bc99a713961e1fca0ffc8ea4b124d959

Comment by Githook User [ 11/Apr/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2641 from wiredtiger/server-22831-backport30

SERVER-22831 Queue more leaf pages than internal pages for eviction.
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/353af609a97782ff86c97e5b6994399821ca57bd

Comment by Githook User [ 11/Apr/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2532 from wiredtiger/server-22831

SERVER-22831 Queue more leaf pages than internal pages for eviction.
(cherry picked from commit 799ca57b6c597c864f12609fdedc4b3de7ebdec9)
Branch: mongodb-3.0
https://github.com/wiredtiger/wiredtiger/commit/2add6ea33102332a6c67cd2f4076c2ba9dafbcaf

Comment by Githook User [ 04/Apr/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.7.0-1181-g43e885a.tar.gz from wiredtiger branch mongodb-3.2

ref: 5cdd3e3..43e885a

SERVER-22676 WiredTiger fails to open databases created by 3.0.0 or 3.0.1
SERVER-22773 gle_shell_server5441.js fails on ppc64le
SERVER-22784 Coverity analysis defect 77722: Unused value
SERVER-22831 Low query rate with heavy cache pressure and an idle collection
SERVER-23040 Coverity analysis defect 98151: Dereference after null check
SERVER-23203 WiredTiger changes for MongoDB 3.2.5
WT-2107 Add example code including an event handler
WT-2123 Don't clear allocated memory if not required
WT-2173 test/format cache stuck full
WT-2264 Checkpoints cannot keep up with inserts
WT-2280 Add CRC32 Optimized code for PPC64LE
WT-2295 WT_SESSION.create does a full-scan of the main table
WT-2318 Configurable thread wake up time
WT-2322 Join cursor with isolation read-uncommitted may give different results with Bloom filters
WT-2345 Evicting tiny pages creates small pages on disk
WT-2346 Don't hold schema lock during checkpoint I/O
WT-2349 Add ability to open databases read-only
WT-2361 column-store starting record number error
WT-2366 Extend wtperf to support updates that grow the record size
WT-2367 WT_CURSOR.next out-of-order returns failure
WT-2374 read error on index file
WT-2375 Need tests for collators
WT-2376 Modules should compile without including wt_internal.h header file
WT-2381 dump utility discards table config
WT-2382 Problem with custom collator for 'u' format with join cursor
WT-2384 lt, le conditions for ordering cursor in join cursor
WT-2387 Fix cursor random unit test on Windows
WT-2390 OS X build is broken
WT-2391 De-prioritize eviction from indexes
WT-2393 Unnecessary error handling labels.
WT-2394 Long Unit Test for test_compact02 failed.
WT-2395 Recovery failure with an LSM tree
WT-2396 Jenkins Spinlock GCC task Hung
WT-2397 Cursor traversal from end of the tree skips records.
WT-2399 Add test case that verifies cursor traversal
WT-2404 Add streaming pack/unpack methods to the extension API
WT-2405 test utility error handling.
WT-2406 cursor_order lint, minor bug fixes
WT-2407 recovery lint
WT-2409 Minor Perf Regression in LSM
WT-2410 Casting function pointers to different types
WT-2411 LSM drop hang
WT-2412 Truncate error tag is incorrect
WT-2414 Avoid extractor calls for ordering cursor in join cursor
WT-2417 Windows Jenkins task is failing
WT-2418 test_rebalance failing with EBUSY
WT-2419 Tests fail to compile on windows due to new util functions
WT-2420 LSM failed to read bytes
WT-2421 test_bloom ret assigned but not used
WT-2422 multiple definitions of custom die function
WT-2423 Session reference count could be wrong if adding the handle fails
WT-2425 evict-btree read through performance drop
WT-2426 Deadlock caused by recent changes to checkpoint handle locking
WT-2427 wtperf should work with builtin compression
WT-2428 Make statistics logging compatible with MongoDB
WT-2429 Add a statistic that tracks aggressive mode in eviction
WT-2430 statistic for join cursor
WT-2431 Join statistics documentation needed
WT-2432 Understand eviction impact on LSM and readonly workload
WT-2433 Allow read-only databases to log statistics
WT-2434 Race between force-drop and sweep
WT-2435 __wt_evict_file_exclusive_on/off cleanups
WT-2436 lt, le conditions for ref cursor with "strategy=bloom" in join cursor
WT-2437 Test suite failures on Windows
WT-2438 Extend WiredTiger stat declarations to help timeseries tool
WT-2440 vpmsum_crc32: Per the PPC64 ABI, v20-v31 are non-volatile register
WT-2443 Getting statistic for all indexes used in join cursor
WT-2444 broken flag test in wtperf, whitespace
WT-2447 join cursor reads main table
WT-2448 Add no_scale flag to relevant statistics
WT-2449 configure should check for a 64-bit build
WT-2451 Allow eviction of metadata
WT-2454 checkpoint_sync=false does not prevent flushes/sync to disk.
WT-2456 Update Power8 CRC32 Code
WT-2457 Dropping an LSM table can fail with EBUSY when no user ops are active
WT-2459 Allow Configure scripts to provide the --tag option for libtool when compiling on PPC
WT-2460 checkpoint failing with WT_ROLLBACK
WT-2461 sweep01 failing
WT-2463 Test that measures idle CPU usage fails under valgrind
WT-2464 Valgrind errors.
WT-2465 Coverity 1352899: Dereference before null check
WT-2466 Coverity 1352893 Buffer not null terminated
WT-2467 Coverity 1352894: Logically dead code
WT-2468 Coverity 1352896: Explicit null dereferenced
WT-2469 Coverity 1352897: Integer overflowed argument
WT-2470 Coverity 1352898: Resource leak
WT-2471 review WiredTiger "int" printf formats
WT-2473 MSVC doesn't support PRId64
WT-2475 Have reconf script remove cached configure results
WT-2476 btree->evict_lock is being accessed after being destroyed
WT-2477 Missing define in Windows wiredtiger_config.h
WT-2478 Valgrind test failures
WT-2481 Recent changes affect LSM performance
WT-2482 Coverity 1353015, 1353016, out-of-bounds access
WT-2483 readonly02 periodically fails
WT-2484 Coverity 1345809: unchecked return value
WT-2485 Test/format failure with Floating point exception
WT-2487 Release memory in manydbs test
WT-2489 warnings from /test/manydbs
WT-2490 search_near() returns wrong key for column-store
WT-2492 Windows test_config04.test_config04.test_invalid_config crashes
WT-2493 verbose lsm_manager unsupported
WT-2494 review calls to __wt_free, plus minor bug in an error path.
WT-2495 Missing memory initialization leads to crash on Windows
WT-2496 test/format unable to read root page
WT-2497 test/format saves copy of backup
WT-2498 LSM tree drop hangs when a user cursor is open
WT-2499 LSM shutdown race causes segfault
WT-2501 Dropping a just opened LSM tree isn't safe
WT-2502 memory leak in locking handles for checkpoint
WT-2503 build warning in lsm_tree.c
WT-2506 Using an uninitialised value
Branch: v3.2
https://github.com/mongodb/mongo/commit/224299a6712196228b65831b6c39498caf8669d2

Comment by Githook User [ 04/Apr/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.7.0-1181-g43e885a.tar.gz from wiredtiger branch mongodb-3.2

ref: 5cdd3e3..43e885a

SERVER-22676 WiredTiger fails to open databases created by 3.0.0 or 3.0.1
SERVER-22773 gle_shell_server5441.js fails on ppc64le
SERVER-22784 Coverity analysis defect 77722: Unused value
SERVER-22831 Low query rate with heavy cache pressure and an idle collection
SERVER-23040 Coverity analysis defect 98151: Dereference after null check
SERVER-23203 WiredTiger changes for MongoDB 3.2.5
WT-2107 Add example code including an event handler
WT-2123 Don't clear allocated memory if not required
WT-2173 test/format cache stuck full
WT-2264 Checkpoints cannot keep up with inserts
WT-2280 Add CRC32 Optimized code for PPC64LE
WT-2295 WT_SESSION.create does a full-scan of the main table
WT-2318 Configurable thread wake up time
WT-2322 Join cursor with isolation read-uncommitted may give different results with Bloom filters
WT-2345 Evicting tiny pages creates small pages on disk
WT-2346 Don't hold schema lock during checkpoint I/O
WT-2349 Add ability to open databases read-only
WT-2361 column-store starting record number error
WT-2366 Extend wtperf to support updates that grow the record size
WT-2367 WT_CURSOR.next out-of-order returns failure
WT-2374 read error on index file
WT-2375 Need tests for collators
WT-2376 Modules should compile without including wt_internal.h header file
WT-2381 dump utility discards table config
WT-2382 Problem with custom collator for 'u' format with join cursor
WT-2384 lt, le conditions for ordering cursor in join cursor
WT-2387 Fix cursor random unit test on Windows
WT-2390 OS X build is broken
WT-2391 De-prioritize eviction from indexes
WT-2393 Unnecessary error handling labels.
WT-2394 Long Unit Test for test_compact02 failed.
WT-2395 Recovery failure with an LSM tree
WT-2396 Jenkins Spinlock GCC task Hung
WT-2397 Cursor traversal from end of the tree skips records.
WT-2399 Add test case that verifies cursor traversal
WT-2404 Add streaming pack/unpack methods to the extension API
WT-2405 test utility error handling.
WT-2406 cursor_order lint, minor bug fixes
WT-2407 recovery lint
WT-2409 Minor Perf Regression in LSM
WT-2410 Casting function pointers to different types
WT-2411 LSM drop hang
WT-2412 Truncate error tag is incorrect
WT-2414 Avoid extractor calls for ordering cursor in join cursor
WT-2417 Windows Jenkins task is failing
WT-2418 test_rebalance failing with EBUSY
WT-2419 Tests fail to compile on windows due to new util functions
WT-2420 LSM failed to read bytes
WT-2421 test_bloom ret assigned but not used
WT-2422 multiple definitions of custom die function
WT-2423 Session reference count could be wrong if adding the handle fails
WT-2425 evict-btree read through performance drop
WT-2426 Deadlock caused by recent changes to checkpoint handle locking
WT-2427 wtperf should work with builtin compression
WT-2428 Make statistics logging compatible with MongoDB
WT-2429 Add a statistic that tracks aggressive mode in eviction
WT-2430 statistic for join cursor
WT-2431 Join statistics documentation needed
WT-2432 Understand eviction impact on LSM and readonly workload
WT-2433 Allow read-only databases to log statistics
WT-2434 Race between force-drop and sweep
WT-2435 __wt_evict_file_exclusive_on/off cleanups
WT-2436 lt, le conditions for ref cursor with "strategy=bloom" in join cursor
WT-2437 Test suite failures on Windows
WT-2438 Extend WiredTiger stat declarations to help timeseries tool
WT-2440 vpmsum_crc32: Per the PPC64 ABI, v20-v31 are non-volatile register
WT-2443 Getting statistic for all indexes used in join cursor
WT-2444 broken flag test in wtperf, whitespace
WT-2447 join cursor reads main table
WT-2448 Add no_scale flag to relevant statistics
WT-2449 configure should check for a 64-bit build
WT-2451 Allow eviction of metadata
WT-2454 checkpoint_sync=false does not prevent flushes/sync to disk.
WT-2456 Update Power8 CRC32 Code
WT-2457 Dropping an LSM table can fail with EBUSY when no user ops are active
WT-2459 Allow Configure scripts to provide the --tag option for libtool when compiling on PPC
WT-2460 checkpoint failing with WT_ROLLBACK
WT-2461 sweep01 failing
WT-2463 Test that measures idle CPU usage fails under valgrind
WT-2464 Valgrind errors.
WT-2465 Coverity 1352899: Dereference before null check
WT-2466 Coverity 1352893 Buffer not null terminated
WT-2467 Coverity 1352894: Logically dead code
WT-2468 Coverity 1352896: Explicit null dereferenced
WT-2469 Coverity 1352897: Integer overflowed argument
WT-2470 Coverity 1352898: Resource leak
WT-2471 review WiredTiger "int" printf formats
WT-2473 MSVC doesn't support PRId64
WT-2475 Have reconf script remove cached configure results
WT-2476 btree->evict_lock is being accessed after being destroyed
WT-2477 Missing define in Windows wiredtiger_config.h
WT-2478 Valgrind test failures
WT-2481 Recent changes affect LSM performance
WT-2482 Coverity 1353015, 1353016, out-of-bounds access
WT-2483 readonly02 periodically fails
WT-2484 Coverity 1345809: unchecked return value
WT-2485 Test/format failure with Floating point exception
WT-2487 Release memory in manydbs test
WT-2489 warnings from /test/manydbs
WT-2490 search_near() returns wrong key for column-store
WT-2492 Windows test_config04.test_config04.test_invalid_config crashes
WT-2493 verbose lsm_manager unsupported
WT-2494 review calls to __wt_free, plus minor bug in an error path.
WT-2495 Missing memory initialization leads to crash on Windows
WT-2496 test/format unable to read root page
WT-2497 test/format saves copy of backup
WT-2498 LSM tree drop hangs when a user cursor is open
WT-2499 LSM shutdown race causes segfault
WT-2501 Dropping a just opened LSM tree isn't safe
WT-2502 memory leak in locking handles for checkpoint
WT-2503 build warning in lsm_tree.c
WT-2506 Using an uninitialised value
Branch: v3.2
https://github.com/mongodb/mongo/commit/224299a6712196228b65831b6c39498caf8669d2

Comment by Ernie Hershey [ 31/Mar/16 ]

3.0.11 is basically a hotfix on top of 3.0.10. Everything that was slated for 3.0.11 before we hit SERVER-23425 has to wait for 3.0.12.

Comment by Githook User [ 25/Mar/16 ]

Author:

{u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

Message: Import wiredtiger-wiredtiger-2.7.0-1181-g43e885a.tar.gz from wiredtiger branch mongodb-3.2

ref: 5cdd3e3..43e885a

SERVER-22676 WiredTiger fails to open databases created by 3.0.0 or 3.0.1
SERVER-22773 gle_shell_server5441.js fails on ppc64le
SERVER-22784 Coverity analysis defect 77722: Unused value
SERVER-22831 Low query rate with heavy cache pressure and an idle collection
SERVER-23040 Coverity analysis defect 98151: Dereference after null check
SERVER-23203 WiredTiger changes for MongoDB 3.2.5
WT-2107 Add example code including an event handler
WT-2123 Don't clear allocated memory if not required
WT-2173 test/format cache stuck full
WT-2264 Checkpoints cannot keep up with inserts
WT-2280 Add CRC32 Optimized code for PPC64LE
WT-2295 WT_SESSION.create does a full-scan of the main table
WT-2318 Configurable thread wake up time
WT-2322 Join cursor with isolation read-uncommitted may give different results with Bloom filters
WT-2345 Evicting tiny pages creates small pages on disk
WT-2346 Don't hold schema lock during checkpoint I/O
WT-2349 Add ability to open databases read-only
WT-2361 column-store starting record number error
WT-2366 Extend wtperf to support updates that grow the record size
WT-2367 WT_CURSOR.next out-of-order returns failure
WT-2374 read error on index file
WT-2375 Need tests for collators
WT-2376 Modules should compile without including wt_internal.h header file
WT-2381 dump utility discards table config
WT-2382 Problem with custom collator for 'u' format with join cursor
WT-2384 lt, le conditions for ordering cursor in join cursor
WT-2387 Fix cursor random unit test on Windows
WT-2390 OS X build is broken
WT-2391 De-prioritize eviction from indexes
WT-2393 Unnecessary error handling labels.
WT-2394 Long Unit Test for test_compact02 failed.
WT-2395 Recovery failure with an LSM tree
WT-2396 Jenkins Spinlock GCC task Hung
WT-2397 Cursor traversal from end of the tree skips records.
WT-2399 Add test case that verifies cursor traversal
WT-2404 Add streaming pack/unpack methods to the extension API
WT-2405 test utility error handling.
WT-2406 cursor_order lint, minor bug fixes
WT-2407 recovery lint
WT-2409 Minor Perf Regression in LSM
WT-2410 Casting function pointers to different types
WT-2411 LSM drop hang
WT-2412 Truncate error tag is incorrect
WT-2414 Avoid extractor calls for ordering cursor in join cursor
WT-2417 Windows Jenkins task is failing
WT-2418 test_rebalance failing with EBUSY
WT-2419 Tests fail to compile on windows due to new util functions
WT-2420 LSM failed to read bytes
WT-2421 test_bloom ret assigned but not used
WT-2422 multiple definitions of custom die function
WT-2423 Session reference count could be wrong if adding the handle fails
WT-2425 evict-btree read through performance drop
WT-2426 Deadlock caused by recent changes to checkpoint handle locking
WT-2427 wtperf should work with builtin compression
WT-2428 Make statistics logging compatible with MongoDB
WT-2429 Add a statistic that tracks aggressive mode in eviction
WT-2430 statistic for join cursor
WT-2431 Join statistics documentation needed
WT-2432 Understand eviction impact on LSM and readonly workload
WT-2433 Allow read-only databases to log statistics
WT-2434 Race between force-drop and sweep
WT-2435 __wt_evict_file_exclusive_on/off cleanups
WT-2436 lt, le conditions for ref cursor with "strategy=bloom" in join cursor
WT-2437 Test suite failures on Windows
WT-2438 Extend WiredTiger stat declarations to help timeseries tool
WT-2440 vpmsum_crc32: Per the PPC64 ABI, v20-v31 are non-volatile register
WT-2443 Getting statistic for all indexes used in join cursor
WT-2444 broken flag test in wtperf, whitespace
WT-2447 join cursor reads main table
WT-2448 Add no_scale flag to relevant statistics
WT-2449 configure should check for a 64-bit build
WT-2451 Allow eviction of metadata
WT-2454 checkpoint_sync=false does not prevent flushes/sync to disk.
WT-2456 Update Power8 CRC32 Code
WT-2457 Dropping an LSM table can fail with EBUSY when no user ops are active
WT-2459 Allow Configure scripts to provide the --tag option for libtool when compiling on PPC
WT-2460 checkpoint failing with WT_ROLLBACK
WT-2461 sweep01 failing
WT-2463 Test that measures idle CPU usage fails under valgrind
WT-2464 Valgrind errors.
WT-2465 Coverity 1352899: Dereference before null check
WT-2466 Coverity 1352893 Buffer not null terminated
WT-2467 Coverity 1352894: Logically dead code
WT-2468 Coverity 1352896: Explicit null dereferenced
WT-2469 Coverity 1352897: Integer overflowed argument
WT-2470 Coverity 1352898: Resource leak
WT-2471 review WiredTiger "int" printf formats
WT-2473 MSVC doesn't support PRId64
WT-2475 Have reconf script remove cached configure results
WT-2476 btree->evict_lock is being accessed after being destroyed
WT-2477 Missing define in Windows wiredtiger_config.h
WT-2478 Valgrind test failures
WT-2481 Recent changes affect LSM performance
WT-2482 Coverity 1353015, 1353016, out-of-bounds access
WT-2483 readonly02 periodically fails
WT-2484 Coverity 1345809: unchecked return value
WT-2485 Test/format failure with Floating point exception
WT-2487 Release memory in manydbs test
WT-2489 warnings from /test/manydbs
WT-2490 search_near() returns wrong key for column-store
WT-2492 Windows test_config04.test_config04.test_invalid_config crashes
WT-2493 verbose lsm_manager unsupported
WT-2494 review calls to __wt_free, plus minor bug in an error path.
WT-2495 Missing memory initialization leads to crash on Windows
WT-2496 test/format unable to read root page
WT-2497 test/format saves copy of backup
WT-2498 LSM tree drop hangs when a user cursor is open
WT-2499 LSM shutdown race causes segfault
WT-2501 Dropping a just opened LSM tree isn't safe
WT-2502 memory leak in locking handles for checkpoint
WT-2503 build warning in lsm_tree.c
WT-2506 Using an uninitialised value
Branch: v3.2
https://github.com/mongodb/mongo/commit/224299a6712196228b65831b6c39498caf8669d2

Comment by Githook User [ 24/Mar/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2532 from wiredtiger/server-22831

SERVER-22831 Queue more leaf pages than internal pages for eviction.
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/799ca57b6c597c864f12609fdedc4b3de7ebdec9

Comment by Githook User [ 24/Mar/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-22831 Queue more leaf pages than internal pages for eviction.

Unless we get aggressive, putting internal pages on the LRU queue is
counterproductive. That is particularly true for workloads that
transition to read-only, where some tables are not being queried. In
that case, all of the leaf pages are evicted, and eviction wastes a lot
of effort walking and queuing internal pages that are not actually
evicted.
Branch: mongodb-3.2
https://github.com/wiredtiger/wiredtiger/commit/319c8f2209eeac78dd2b49765853ddac1b690381

Comment by Githook User [ 29/Feb/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

Message: Import wiredtiger-wiredtiger-2.7.0-829-g4531b92.tar.gz from wiredtiger branch mongodb-3.4

ref: 4f38287..4531b92

SERVER-22784 Coverity analysis defect 77722: Unused value
SERVER-22831 Low query rate with heavy cache pressure and an idle collection
WT-2173 test/format cache stuck full
WT-2264 Checkpoints cannot keep up with inserts
WT-2349 Add ability to open databases read-only
WT-2376 Modules should compile without including wt_internal.h header file
WT-2382 Problem with custom collator for 'u' format with join cursor
WT-2394 Long Unit Test for test_compact02 failed.
WT-2395 Recovery failure with an LSM tree
WT-2399 Add test case that verifies cursor traversal
WT-2405 test utility error handling.
WT-2407 recovery lint
WT-2409 Minor Perf Regression in LSM
WT-2410 Casting function pointers to different types
WT-2411 LSM drop hang
WT-2412 Truncate error tag is incorrect
WT-2417 Windows Jenkins task is failing
WT-2419 Tests fail to compile on windows due to new util functions
WT-2420 LSM failed to read bytes
WT-2423 Session reference count could be wrong if adding the handle fails
WT-2425 evict-btree read through performance drop
WT-2428 Make statistics logging compatible with MongoDB
WT-2429 Add a statistic that tracks aggressive mode in eviction
Branch: master
https://github.com/mongodb/mongo/commit/ff0846809805c5a9a961314d3b6ec9ed7bbe0947

Comment by Githook User [ 29/Feb/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2532 from wiredtiger/server-22831

SERVER-22831 Queue more leaf pages than internal pages for eviction.
Branch: mongodb-3.4
https://github.com/wiredtiger/wiredtiger/commit/799ca57b6c597c864f12609fdedc4b3de7ebdec9

Comment by Githook User [ 29/Feb/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-22831 Queue more leaf pages than internal pages for eviction.

Unless we get aggressive, putting internal pages on the LRU queue is
counterproductive. That is particularly true for workloads that
transition to read-only, where some tables are not being queried. In
that case, all of the leaf pages are evicted, and eviction wastes a lot
of effort walking and queuing internal pages that are not actually
evicted.
Branch: mongodb-3.4
https://github.com/wiredtiger/wiredtiger/commit/319c8f2209eeac78dd2b49765853ddac1b690381

Comment by Alexander Gorrod [ 29/Feb/16 ]

Thanks bruce.lucas Your analysis matches what we saw in our testing, and the fix made above to WiredTiger should resolve this new scenario as well. The reason the performance differs is that WiredTiger was evicting all leaf pages from the idle connection, but it adds a selection of pages from each tree it visits to a list of eviction candidates. It would add some internal pages from the idle tree and some leaf pages from the non-idle tree. It would always choose to evict the leaf pages since we deprioritize eviction of internal pages. So we walk the internal pages without benefit, and populate half the eviction queue with pages that are very unlikely to be evicted.

The fix we made was to limit the number of internal pages added to the eviction queue.

Comment by Bruce Lucas (Inactive) [ 26/Feb/16 ]

Note that this doesn't require an oplog, nor a transition to read-only. All that's necessary I believe is

  • heavy cache pressure
  • more than one collection
  • one collection is in cache and then becomes idle

For example:

  • two identical collections, 10 GB each (10 M documents x 10 kB)
  • standalone (no oplog), 3 GB cache

  • E-F: reading randomly from collection 1
  • G-H: reading randomly from collection 2
  • I-J: reading randomly from collection 1 again

In each case the rate of reads immediately after transitioning to a new collection is high, but once all the leaf pages of the collection that has become idle have been evicted to make room for the newly active collection, the performance issue is triggered: rate of pages walked for eviction becomes high, rate of queries drops.

Comment by Githook User [ 25/Feb/16 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Merge pull request #2532 from wiredtiger/server-22831

SERVER-22831 Queue more leaf pages than internal pages for eviction.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/799ca57b6c597c864f12609fdedc4b3de7ebdec9

Comment by Githook User [ 25/Feb/16 ]

Author:

{u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

Message: SERVER-22831 Queue more leaf pages than internal pages for eviction.

Unless we get aggressive, putting internal pages on the LRU queue is
counterproductive. That is particularly true for workloads that
transition to read-only, where some tables are not being queried. In
that case, all of the leaf pages are evicted, and eviction wastes a lot
of effort walking and queuing internal pages that are not actually
evicted.
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/319c8f2209eeac78dd2b49765853ddac1b690381

Comment by Bruce Lucas (Inactive) [ 24/Feb/16 ]

Attached FTDC data.

Comment by Bruce Lucas (Inactive) [ 24/Feb/16 ]

Repro script

db=/ssd/db
 
insert_threads=25
query_threads=100
total=10000000 # 1 kB docs
 
function clean {
    killall -9 -w mongod
    rm -rf $db db
    mkdir -p $db
    ln -s $db .
}
        
function start {
    killall -w mongod
    mongo_smallfiles="--oplogSize 20000"
    mongod --dbpath $db --logpath $db.log --replSet rs --oplogSize 20000 --wiredTigerCacheSizeGB 3 --fork
    mongo --eval 'rs.initiate(); while (rs.status().myState!=1) sleep(1000); sleep(1000)'
}
 
function insert {
    (
        for t in $(seq $insert_threads); do
            mongo --quiet --eval "
                (function (t, threads, total) {
                    x = ''
                    for (var i=0; i<1000; i++)
                        x += 'x'
                    count = total / threads
                    every = 10000
                    for (var i=0; i<count; ) {
                        var bulk = db.c.initializeUnorderedBulkOp();
                        for (var j=0; j<every; j++, i++)
                            bulk.insert({_id:i*threads+t, x:x})
                        bulk.execute();
                        if (t==1)
                            print(i, '/', count)
                    }
                })($t, $insert_threads, $total)
            " &
        done
        wait
    )
}
 
function query {
    (
        for t in $(seq $query_threads); do
            mongo --eval "
                (function (total) {
                    while (true) {
                        id = Math.floor(Math.random()*total) + 1
                        db.c.findOne({_id:id})
                    }
                })($total)
            " &
        done
        sleep 120
        killall mongo
    )
}
 
clean; start; insert; query
start; query

Comment by Bruce Lucas (Inactive) [ 24/Feb/16 ]

Does not reproduce, or at least is very much less prominent, under 3.0.9. However there is also a signficant performance regression on this test in 3.2.3 vs 3.0.9 that appears in a standalone test (whereas the issue on this ticket only appears with a replica set); opened SERVER-22834 to track separately, although it is possibly related as it also appears to involve a high rate of pages walked per page evicted.

Generated at Thu Feb 08 04:01:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.