Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2670

Inefficient I/O when read full DB (poor readahead)

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: WT2.8.0
    • Fix Version/s: 3.2.12, WT2.9.1, 3.5.1, 3.4.2
    • Labels:
      None
    • # Replies:
      32
    • Last comment by Customer:
      true
    • Sprint:
      Storage 2016-10-31, Storage 2016-11-21, Storage 2016-12-12

      Description

      We see inefficiency in disk I/O when read full database (either by primary or secondary key). Our investigation shows that WT opens database and immediately sets POSIX_FADV_RANDOM for db and index files. POSIX_FADV_WILLNEED is set only for a small number of pages (as we understand for internal pages only). A quick test with preloading an empty posix_faddvise() shows almost 2x speedup in full database read. For 3.9 GB database stored on spinning disk we have:

      # regular posix_faddvise():
      PK total time: 45.1785 seconds, 17530462 records
      # empty posix_faddvise():
      PK total time: 25.8627 seconds, 17530462 records
      

      It is a common scenario for us to do full database scan. Is it possible to optimize this use case?

        Issue Links

          Activity

          Hide
          eiva Eugene Ivanov added a comment - - edited

          Sue LoVerso, I got interesting results. DB size is 4.5 GB (without indices), snappy compression, 20 000 000 records. I read by secondary key (pk result is similar), cold cache:

          2.8: 93 seconds
          2.8 (commented out posix_fadvise call): 46 seconds
          2.9.0 (master): 46 seconds
          2.9.1 (development) hint=random/seq/none: 46 seconds
          

          Looks like something changed between 2.8 and 2.9 so that reads became sequential (and my test got a good speedup). Possibly it is also confirmed by our test for random update (WT-3089).

          Show
          eiva Eugene Ivanov added a comment - - edited Sue LoVerso , I got interesting results. DB size is 4.5 GB (without indices), snappy compression, 20 000 000 records. I read by secondary key (pk result is similar), cold cache: 2.8: 93 seconds 2.8 (commented out posix_fadvise call): 46 seconds 2.9.0 (master): 46 seconds 2.9.1 (development) hint=random/seq/none: 46 seconds Looks like something changed between 2.8 and 2.9 so that reads became sequential (and my test got a good speedup). Possibly it is also confirmed by our test for random update ( WT-3089 ).
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'}

          Message: WT-2670 Add access_pattern_hint configuration for tables (#3155)
          Branch: mongodb-3.4
          https://github.com/wiredtiger/wiredtiger/commit/853430ea86b8e29cdfa9de34606405d52384d2db

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'} Message: WT-2670 Add access_pattern_hint configuration for tables (#3155) Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/853430ea86b8e29cdfa9de34606405d52384d2db
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'}

          Message: WT-2670 Add access_pattern_hint configuration for tables (#3155)
          Branch: mongodb-3.2
          https://github.com/wiredtiger/wiredtiger/commit/853430ea86b8e29cdfa9de34606405d52384d2db

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'} Message: WT-2670 Add access_pattern_hint configuration for tables (#3155) Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/853430ea86b8e29cdfa9de34606405d52384d2db
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: Import wiredtiger: d48181f6f4db08761ed7b80b0332908b272ad0d0 from branch mongodb-3.2

          ref: 040e3d6f76..d48181f6f4
          for: 3.2.12

          SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers
          WT-2336 Add a test validating schema operations via file system call monitoring
          WT-2402 Misaligned structure accesses lead to undefined behavior
          WT-2670 Inefficient I/O when read full DB (poor readahead)
          WT-283 Add a way to change persistent object settings
          WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage
          WT-2969 Possible snapshot corruption during compaction
          WT-3014 Add GCC/clang support for ELF symbol visibility.
          WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode
          WT-3025 fix error path in log_force_sync
          WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check
          WT-3030 Test failure indicating invalid key order during traversal
          WT-3034 Add support for single-writer named snapshots.
          WT-3037 Fix some outdated comments in logging
          WT-3048 WiredTiger maximum size warning uses the wrong format.
          WT-3051 Remove external __wt_hex symbol.
          WT-3052 Improve search if an index hint is wrong
          WT-3053 Review Python and Java calls to internal WiredTiger functions
          WT-3054 Java PackTest, PackTest03 do not compile
          WT-3055 Java AsyncTest faults
          WT-3056 For cursors with projections, keys should be allowed
          WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE.
          WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64
          WT-3064 minor tree cleanups: .gitignore, NEWS misspelling
          WT-3066 lint
          WT-3068 Copy wtperf artifacts when running Jenkins tests
          WT-3069 Fix build failures in LevelDB APIs
          WT-3070 Fix search_near() for index cursor
          WT-3071 Java: fix build with -Werror=sign-conversion
          WT-3075 Document and enforce that WiredTiger now depends on Python 2.7
          WT-3078 Fix a hang in the reconfiguration test.
          WT-3084 Fix Coverity resource leak complaint.
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/52b68fa86ea43e909ad42c901d0579bced6b205f

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: Import wiredtiger: d48181f6f4db08761ed7b80b0332908b272ad0d0 from branch mongodb-3.2 ref: 040e3d6f76..d48181f6f4 for: 3.2.12 SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers WT-2336 Add a test validating schema operations via file system call monitoring WT-2402 Misaligned structure accesses lead to undefined behavior WT-2670 Inefficient I/O when read full DB (poor readahead) WT-283 Add a way to change persistent object settings WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage WT-2969 Possible snapshot corruption during compaction WT-3014 Add GCC/clang support for ELF symbol visibility. WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode WT-3025 fix error path in log_force_sync WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check WT-3030 Test failure indicating invalid key order during traversal WT-3034 Add support for single-writer named snapshots. WT-3037 Fix some outdated comments in logging WT-3048 WiredTiger maximum size warning uses the wrong format. WT-3051 Remove external __wt_hex symbol. WT-3052 Improve search if an index hint is wrong WT-3053 Review Python and Java calls to internal WiredTiger functions WT-3054 Java PackTest, PackTest03 do not compile WT-3055 Java AsyncTest faults WT-3056 For cursors with projections, keys should be allowed WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE. WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64 WT-3064 minor tree cleanups: .gitignore, NEWS misspelling WT-3066 lint WT-3068 Copy wtperf artifacts when running Jenkins tests WT-3069 Fix build failures in LevelDB APIs WT-3070 Fix search_near() for index cursor WT-3071 Java: fix build with -Werror=sign-conversion WT-3075 Document and enforce that WiredTiger now depends on Python 2.7 WT-3078 Fix a hang in the reconfiguration test. WT-3084 Fix Coverity resource leak complaint. Branch: v3.2 https://github.com/mongodb/mongo/commit/52b68fa86ea43e909ad42c901d0579bced6b205f
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'}

          Message: Import wiredtiger: 8d2324943364286056ae399043f70b8a937de312 from branch mongodb-3.4

          ref: ca6eee06ff..8d23249433
          for: 3.4.2

          SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers
          WT-2336 Add a test validating schema operations via file system call monitoring
          WT-2402 Misaligned structure accesses lead to undefined behavior
          WT-2670 Inefficient I/O when read full DB (poor readahead)
          WT-283 Add a way to change persistent object settings
          WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage
          WT-2969 Possible snapshot corruption during compaction
          WT-3014 Add GCC/clang support for ELF symbol visibility.
          WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode
          WT-3025 fix error path in log_force_sync
          WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check
          WT-3030 Test failure indicating invalid key order during traversal
          WT-3034 Add support for single-writer named snapshots.
          WT-3037 Fix some outdated comments in logging
          WT-3048 WiredTiger maximum size warning uses the wrong format.
          WT-3051 Remove external __wt_hex symbol.
          WT-3052 Improve search if an index hint is wrong
          WT-3053 Review Python and Java calls to internal WiredTiger functions
          WT-3054 Java PackTest, PackTest03 do not compile
          WT-3055 Java AsyncTest faults
          WT-3056 For cursors with projections, keys should be allowed
          WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE.
          WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64
          WT-3064 minor tree cleanups: .gitignore, NEWS misspelling
          WT-3066 lint
          WT-3068 Copy wtperf artifacts when running Jenkins tests
          WT-3069 Fix build failures in LevelDB APIs
          WT-3070 Fix search_near() for index cursor
          WT-3071 Java: fix build with -Werror=sign-conversion
          WT-3075 Document and enforce that WiredTiger now depends on Python 2.7
          WT-3078 Fix a hang in the reconfiguration test.
          WT-3084 Fix Coverity resource leak complaint.
          Branch: v3.4
          https://github.com/mongodb/mongo/commit/d2c64ac8c526b70eadeb859ec41370a5f03a64aa

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'daveh86', u'name': u'David Hows', u'email': u'howsdav@gmail.com'} Message: Import wiredtiger: 8d2324943364286056ae399043f70b8a937de312 from branch mongodb-3.4 ref: ca6eee06ff..8d23249433 for: 3.4.2 SERVER-26545 Remove fixed-size limitation on WiredTiger hazard pointers WT-2336 Add a test validating schema operations via file system call monitoring WT-2402 Misaligned structure accesses lead to undefined behavior WT-2670 Inefficient I/O when read full DB (poor readahead) WT-283 Add a way to change persistent object settings WT-2960 Inserting multi-megabyte values can cause pathological lookaside usage WT-2969 Possible snapshot corruption during compaction WT-3014 Add GCC/clang support for ELF symbol visibility. WT-3021 Fixes needed for Java log cursor example, Java raw mode cursors, log cursors in raw mode WT-3025 fix error path in log_force_sync WT-3028 Workloads with all dirty pages could trigger diagnostic stuck check WT-3030 Test failure indicating invalid key order during traversal WT-3034 Add support for single-writer named snapshots. WT-3037 Fix some outdated comments in logging WT-3048 WiredTiger maximum size warning uses the wrong format. WT-3051 Remove external __wt_hex symbol. WT-3052 Improve search if an index hint is wrong WT-3053 Review Python and Java calls to internal WiredTiger functions WT-3054 Java PackTest, PackTest03 do not compile WT-3055 Java AsyncTest faults WT-3056 For cursors with projections, keys should be allowed WT-3057 WiredTiger hazard pointers should use the WT_REF, not the WT_PAGE. WT-3061 syscall test runs with checkpoint_sync=false and doesn't acknowledge pwrite64 WT-3064 minor tree cleanups: .gitignore, NEWS misspelling WT-3066 lint WT-3068 Copy wtperf artifacts when running Jenkins tests WT-3069 Fix build failures in LevelDB APIs WT-3070 Fix search_near() for index cursor WT-3071 Java: fix build with -Werror=sign-conversion WT-3075 Document and enforce that WiredTiger now depends on Python 2.7 WT-3078 Fix a hang in the reconfiguration test. WT-3084 Fix Coverity resource leak complaint. Branch: v3.4 https://github.com/mongodb/mongo/commit/d2c64ac8c526b70eadeb859ec41370a5f03a64aa

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                11 weeks, 4 days ago
                Date of 1st Reply:

                  Agile