Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2874

Change test_compact01 to avoid eviction

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.2.10, 3.3.15
    • Labels:
      None
    • # Replies:
      9
    • Last comment by Customer:
      true

      Description

      With the recent changes to make eviction go agressive more regularly, we can see WT_ROLLBACK being thrown by test_compact01. This rollback is somewhat expected, as the operation can indeed be rolled back.

      There will need to be some changes in the eviction server as to when we get aggressive and we can also look to improve the method we use when doing whole table truncates (which currently delete every record).

        Issue Links

          Activity

          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          David Hows Do you have any more information about this failure? Is it reproducible? Did it occur in automated testing or local testing? If in automated testing, has it happened multiple times?

          Show
          alexander.gorrod Alexander Gorrod added a comment - David Hows Do you have any more information about this failure? Is it reproducible? Did it occur in automated testing or local testing? If in automated testing, has it happened multiple times?
          Hide
          keith.bostic Keith Bostic added a comment -

          Alexander Gorrod, this happened repeatedly when Michael Cahill's eviction changes first started being integrated, but I haven't seen it recently. Looking back over my mail, I think we first saw this one around August 25th, and then I have records of it happening 11 more times, with the last occurrence on September 5th. As far as I know, we haven't seen it since.

          The failure looks like this:

          ======================================================================
          ERROR: test_compact01.test_compact.test_compact(table.method) (subunit.RemotedTestCase)
          test_compact01.test_compact.test_compact(table.method)
          ----------------------------------------------------------------------
          _StringException: Traceback (most recent call last):
            File "/mnt/fast/jenkins/workspace/wiredtiger-test-unit-long/test/suite/test_compact01.py", line 80, in test_compact
              self.session.truncate(None, c1, c2, None)
          WiredTigerError: WT_ROLLBACK: conflict between concurrent operations
          

          Here are two relatively recent failures.

          zSeries: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit-zseries/352/console (revision cd933b92a78ac8992d41829064ddf22cce7ec9d7)

          aws-build-test: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit-long/1642/console (revision 6c176ab7e856ac2c5df795622b83482111f0f6d6)

          Show
          keith.bostic Keith Bostic added a comment - Alexander Gorrod , this happened repeatedly when Michael Cahill 's eviction changes first started being integrated, but I haven't seen it recently. Looking back over my mail, I think we first saw this one around August 25th, and then I have records of it happening 11 more times, with the last occurrence on September 5th. As far as I know, we haven't seen it since. The failure looks like this: ====================================================================== ERROR: test_compact01.test_compact.test_compact(table.method) (subunit.RemotedTestCase) test_compact01.test_compact.test_compact(table.method) ---------------------------------------------------------------------- _StringException: Traceback (most recent call last): File "/mnt/fast/jenkins/workspace/wiredtiger-test-unit-long/test/suite/test_compact01.py", line 80, in test_compact self.session.truncate(None, c1, c2, None) WiredTigerError: WT_ROLLBACK: conflict between concurrent operations Here are two relatively recent failures. zSeries: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit-zseries/352/console (revision cd933b92a78ac8992d41829064ddf22cce7ec9d7) aws-build-test: http://build.wiredtiger.com:8080/job/wiredtiger-test-unit-long/1642/console (revision 6c176ab7e856ac2c5df795622b83482111f0f6d6)
          Hide
          keith.bostic Keith Bostic added a comment -

          This diff quickly repeats the failure for me, with revision 6c176ab7e856ac2c5df795622b83482111f0f6d6:

          diff --git a/test/suite/test_compact01.py b/test/suite/test_compact01.py
          index 183d75f..d968087 100644
          --- a/test/suite/test_compact01.py
          +++ b/test/suite/test_compact01.py
          @@ -45,7 +45,6 @@ class test_compact(wttest.WiredTigerTestCase, suite_subprocess):
               # The table is a complex object, give it roughly 5 pages per underlying
               # file.
               types = [
          -        ('file', dict(type='file:', pop=simple_populate, maxpages=5)),
                   ('table', dict(type='table:', pop=complex_populate, maxpages=50))
                   ]
               compact = [
          @@ -56,7 +55,7 @@ class test_compact(wttest.WiredTigerTestCase, suite_subprocess):
               scenarios = make_scenarios(types, compact)
               # We want a large cache so that eviction doesn't happen
               # (which could skew our compaction results).
          -    conn_config = 'cache_size=250MB,statistics=(all)'
          +    conn_config = 'cache_size=50MB,statistics=(all)'
           
               # Test compaction.
               def test_compact(self):
          

          Show
          keith.bostic Keith Bostic added a comment - This diff quickly repeats the failure for me, with revision 6c176ab7e856ac2c5df795622b83482111f0f6d6: diff --git a/test/suite/test_compact01.py b/test/suite/test_compact01.py index 183d75f..d968087 100644 --- a/test/suite/test_compact01.py +++ b/test/suite/test_compact01.py @@ -45,7 +45,6 @@ class test_compact(wttest.WiredTigerTestCase, suite_subprocess): # The table is a complex object, give it roughly 5 pages per underlying # file. types = [ - ('file', dict(type='file:', pop=simple_populate, maxpages=5)), ('table', dict(type='table:', pop=complex_populate, maxpages=50)) ] compact = [ @@ -56,7 +55,7 @@ class test_compact(wttest.WiredTigerTestCase, suite_subprocess): scenarios = make_scenarios(types, compact) # We want a large cache so that eviction doesn't happen # (which could skew our compaction results). - conn_config = 'cache_size=250MB,statistics=(all)' + conn_config = 'cache_size=50MB,statistics=(all)' # Test compaction. def test_compact(self):
          Hide
          keith.bostic Keith Bostic added a comment -

          The rollback condition that's being triggered is this one in __wt_cache_eviction_worker:

          /*      
           * A pathological case: if we're the oldest transaction in the
           * system and the eviction server is stuck trying to find space,
           * abort the transaction to give up all hazard pointers before
           * trying again.
           */      
          if (F_ISSET(cache, WT_CACHE_STUCK) &&
              __wt_txn_am_oldest(session)) {
                  F_CLR(cache, WT_CACHE_STUCK);
                  WT_STAT_FAST_CONN_INCR(session, txn_fail_cache);
                  return (WT_ROLLBACK);
          }
          

          Given this is an expected error and test_compact01 is explicitly written to not trigger eviction, and this error is no longer happening (presumably because of more recent eviction changes), I'm going to push a branch that increases the cache size for this test to make it less likely to happen in the future.

          I think that's sufficient to close this one.

          Show
          keith.bostic Keith Bostic added a comment - The rollback condition that's being triggered is this one in __wt_cache_eviction_worker : /* * A pathological case: if we're the oldest transaction in the * system and the eviction server is stuck trying to find space, * abort the transaction to give up all hazard pointers before * trying again. */ if (F_ISSET(cache, WT_CACHE_STUCK) && __wt_txn_am_oldest(session)) { F_CLR(cache, WT_CACHE_STUCK); WT_STAT_FAST_CONN_INCR(session, txn_fail_cache); return (WT_ROLLBACK); } Given this is an expected error and test_compact01 is explicitly written to not trigger eviction, and this error is no longer happening (presumably because of more recent eviction changes), I'm going to push a branch that increases the cache size for this test to make it less likely to happen in the future. I think that's sufficient to close this one.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2874 Change test_compact01 to avoid eviction (#3050)

          The test relies on operating without eviction.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/7652a55a509721813d94e09fba9733a72d2db788

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2874 Change test_compact01 to avoid eviction (#3050) The test relies on operating without eviction. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/7652a55a509721813d94e09fba9733a72d2db788
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2874 Change test_compact01 to avoid eviction (#3050)

          The test relies on operating without eviction.
          Branch: mongodb-3.4
          https://github.com/wiredtiger/wiredtiger/commit/7652a55a509721813d94e09fba9733a72d2db788

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2874 Change test_compact01 to avoid eviction (#3050) The test relies on operating without eviction. Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/7652a55a509721813d94e09fba9733a72d2db788
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2874 Change test_compact01 to avoid eviction (#3050)

          The test relies on operating without eviction.
          Branch: mongodb-3.2
          https://github.com/wiredtiger/wiredtiger/commit/7652a55a509721813d94e09fba9733a72d2db788

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2874 Change test_compact01 to avoid eviction (#3050) The test relies on operating without eviction. Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/7652a55a509721813d94e09fba9733a72d2db788
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

          Message: Import wiredtiger: 9cf2f89d6d95e1de797f05ab1fef28695f8bae7b from branch mongodb-3.2

          ref: bb18c43915..9cf2f89d6d
          for: 3.2.10

          WT-2864 Reconfiguring the checkpoint server can lead to hangs
          WT-2874 Change test_compact01 to avoid eviction
          WT-2918 The dist scripts create C files s_whitespace complains about
          WT-2919 Don't mask error returns from style checking scripts
          WT-2921 Reduce the WT_SESSION hazard_size when possible
          WT-2923 heap-use-after-free on address in compaction
          WT-2924 Ensure we are doing eviction when threads are waiting for it
          WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag
          WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock
          WT-2928 Eviction failing to switch queues can lead to starvation
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/79d9b3ab5ce20f51c272b4411202710a082d0317

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'} Message: Import wiredtiger: 9cf2f89d6d95e1de797f05ab1fef28695f8bae7b from branch mongodb-3.2 ref: bb18c43915..9cf2f89d6d for: 3.2.10 WT-2864 Reconfiguring the checkpoint server can lead to hangs WT-2874 Change test_compact01 to avoid eviction WT-2918 The dist scripts create C files s_whitespace complains about WT-2919 Don't mask error returns from style checking scripts WT-2921 Reduce the WT_SESSION hazard_size when possible WT-2923 heap-use-after-free on address in compaction WT-2924 Ensure we are doing eviction when threads are waiting for it WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock WT-2928 Eviction failing to switch queues can lead to starvation Branch: v3.2 https://github.com/mongodb/mongo/commit/79d9b3ab5ce20f51c272b4411202710a082d0317
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

          Message: Import wiredtiger: fc0e7abe82595e579573d42448632f7b36a2d154 from branch mongodb-3.4

          ref: 5bc03723a7..fc0e7abe82
          for: 3.3.15

          WT-2864 Reconfiguring the checkpoint server can lead to hangs
          WT-2874 Change test_compact01 to avoid eviction
          WT-2918 The dist scripts create C files s_whitespace complains about
          WT-2919 Don't mask error returns from style checking scripts
          WT-2921 Reduce the WT_SESSION hazard_size when possible
          WT-2923 heap-use-after-free on address in compaction
          WT-2924 Ensure we are doing eviction when threads are waiting for it
          WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag
          WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock
          WT-2928 Eviction failing to switch queues can lead to starvation
          Branch: master
          https://github.com/mongodb/mongo/commit/9dda827a3ae58beef36d53da1b55554cbd8744c4

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'} Message: Import wiredtiger: fc0e7abe82595e579573d42448632f7b36a2d154 from branch mongodb-3.4 ref: 5bc03723a7..fc0e7abe82 for: 3.3.15 WT-2864 Reconfiguring the checkpoint server can lead to hangs WT-2874 Change test_compact01 to avoid eviction WT-2918 The dist scripts create C files s_whitespace complains about WT-2919 Don't mask error returns from style checking scripts WT-2921 Reduce the WT_SESSION hazard_size when possible WT-2923 heap-use-after-free on address in compaction WT-2924 Ensure we are doing eviction when threads are waiting for it WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock WT-2928 Eviction failing to switch queues can lead to starvation Branch: master https://github.com/mongodb/mongo/commit/9dda827a3ae58beef36d53da1b55554cbd8744c4

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                38 weeks, 6 days ago
                Date of 1st Reply: