Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-2864

Reconfiguring the checkpoint server can lead to hangs

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.0, 3.2.10, 3.3.15
    • Labels:
      None
    • # Replies:
      7
    • Last comment by Customer:
      true

      Description

      The new test case that constantly causes reconfigure has uncovered a bug in the way the checkpoint server thread works. If it is reconfigured in some combination of enabling and disabling log and time based checkpoints it is possible that the server will miss a signal on a condition variable and end up waiting forever.

      The particular condition wait in question is:
      https://github.com/wiredtiger/wiredtiger/blame/master/src/conn/conn_ckpt.c#L98

        Issue Links

          Activity

          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          There was a possible sighting of this here: http://build.wiredtiger.com:8080/job/wiredtiger-pull-request-linux/921/

          The test timed out while running the reconfigure test case.

          Show
          alexander.gorrod Alexander Gorrod added a comment - There was a possible sighting of this here: http://build.wiredtiger.com:8080/job/wiredtiger-pull-request-linux/921/ The test timed out while running the reconfigure test case.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2864 Update reconfigure test to detect hangs (#3051)

          Set an alarm when running reconfiguration tests so we can detect hangs;
          if the alarm goes off, output the current configuration string and drop
          core.

          We've seen cases where reconfiguring causes a hang, but haven't been able to identify the root cause.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/8ef98127b1fbf1d8a0fc8c2e97ad9b06b7d3517a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2864 Update reconfigure test to detect hangs (#3051) Set an alarm when running reconfiguration tests so we can detect hangs; if the alarm goes off, output the current configuration string and drop core. We've seen cases where reconfiguring causes a hang, but haven't been able to identify the root cause. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/8ef98127b1fbf1d8a0fc8c2e97ad9b06b7d3517a
          Hide
          alexander.gorrod Alexander Gorrod added a comment -

          The commit here is a test change, if it helps identify the root cause soon we can re-use the ticket, so I'm leaving it open for now.

          Show
          alexander.gorrod Alexander Gorrod added a comment - The commit here is a test change, if it helps identify the root cause soon we can re-use the ticket, so I'm leaving it open for now.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2864 Update reconfigure test to detect hangs (#3051)

          Set an alarm when running reconfiguration tests so we can detect hangs;
          if the alarm goes off, output the current configuration string and drop
          core.

          We've seen cases where reconfiguring causes a hang, but haven't been able to identify the root cause.
          Branch: mongodb-3.4
          https://github.com/wiredtiger/wiredtiger/commit/8ef98127b1fbf1d8a0fc8c2e97ad9b06b7d3517a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2864 Update reconfigure test to detect hangs (#3051) Set an alarm when running reconfiguration tests so we can detect hangs; if the alarm goes off, output the current configuration string and drop core. We've seen cases where reconfiguring causes a hang, but haven't been able to identify the root cause. Branch: mongodb-3.4 https://github.com/wiredtiger/wiredtiger/commit/8ef98127b1fbf1d8a0fc8c2e97ad9b06b7d3517a
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'}

          Message: WT-2864 Update reconfigure test to detect hangs (#3051)

          Set an alarm when running reconfiguration tests so we can detect hangs;
          if the alarm goes off, output the current configuration string and drop
          core.

          We've seen cases where reconfiguring causes a hang, but haven't been able to identify the root cause.
          Branch: mongodb-3.2
          https://github.com/wiredtiger/wiredtiger/commit/8ef98127b1fbf1d8a0fc8c2e97ad9b06b7d3517a

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'keithbostic', u'name': u'Keith Bostic', u'email': u'keith.bostic@mongodb.com'} Message: WT-2864 Update reconfigure test to detect hangs (#3051) Set an alarm when running reconfiguration tests so we can detect hangs; if the alarm goes off, output the current configuration string and drop core. We've seen cases where reconfiguring causes a hang, but haven't been able to identify the root cause. Branch: mongodb-3.2 https://github.com/wiredtiger/wiredtiger/commit/8ef98127b1fbf1d8a0fc8c2e97ad9b06b7d3517a
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

          Message: Import wiredtiger: 9cf2f89d6d95e1de797f05ab1fef28695f8bae7b from branch mongodb-3.2

          ref: bb18c43915..9cf2f89d6d
          for: 3.2.10

          WT-2864 Reconfiguring the checkpoint server can lead to hangs
          WT-2874 Change test_compact01 to avoid eviction
          WT-2918 The dist scripts create C files s_whitespace complains about
          WT-2919 Don't mask error returns from style checking scripts
          WT-2921 Reduce the WT_SESSION hazard_size when possible
          WT-2923 heap-use-after-free on address in compaction
          WT-2924 Ensure we are doing eviction when threads are waiting for it
          WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag
          WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock
          WT-2928 Eviction failing to switch queues can lead to starvation
          Branch: v3.2
          https://github.com/mongodb/mongo/commit/79d9b3ab5ce20f51c272b4411202710a082d0317

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'} Message: Import wiredtiger: 9cf2f89d6d95e1de797f05ab1fef28695f8bae7b from branch mongodb-3.2 ref: bb18c43915..9cf2f89d6d for: 3.2.10 WT-2864 Reconfiguring the checkpoint server can lead to hangs WT-2874 Change test_compact01 to avoid eviction WT-2918 The dist scripts create C files s_whitespace complains about WT-2919 Don't mask error returns from style checking scripts WT-2921 Reduce the WT_SESSION hazard_size when possible WT-2923 heap-use-after-free on address in compaction WT-2924 Ensure we are doing eviction when threads are waiting for it WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock WT-2928 Eviction failing to switch queues can lead to starvation Branch: v3.2 https://github.com/mongodb/mongo/commit/79d9b3ab5ce20f51c272b4411202710a082d0317
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'}

          Message: Import wiredtiger: fc0e7abe82595e579573d42448632f7b36a2d154 from branch mongodb-3.4

          ref: 5bc03723a7..fc0e7abe82
          for: 3.3.15

          WT-2864 Reconfiguring the checkpoint server can lead to hangs
          WT-2874 Change test_compact01 to avoid eviction
          WT-2918 The dist scripts create C files s_whitespace complains about
          WT-2919 Don't mask error returns from style checking scripts
          WT-2921 Reduce the WT_SESSION hazard_size when possible
          WT-2923 heap-use-after-free on address in compaction
          WT-2924 Ensure we are doing eviction when threads are waiting for it
          WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag
          WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock
          WT-2928 Eviction failing to switch queues can lead to starvation
          Branch: master
          https://github.com/mongodb/mongo/commit/9dda827a3ae58beef36d53da1b55554cbd8744c4

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'name': u'Ramon Fernandez', u'email': u'ramon@mongodb.com'} Message: Import wiredtiger: fc0e7abe82595e579573d42448632f7b36a2d154 from branch mongodb-3.4 ref: 5bc03723a7..fc0e7abe82 for: 3.3.15 WT-2864 Reconfiguring the checkpoint server can lead to hangs WT-2874 Change test_compact01 to avoid eviction WT-2918 The dist scripts create C files s_whitespace complains about WT-2919 Don't mask error returns from style checking scripts WT-2921 Reduce the WT_SESSION hazard_size when possible WT-2923 heap-use-after-free on address in compaction WT-2924 Ensure we are doing eviction when threads are waiting for it WT-2925 WT_THREAD_PANIC_FAIL is a WT_THREAD structure flag WT-2926 WT_CONNECTION.reconfigure can attempt unlock of not-locked lock WT-2928 Eviction failing to switch queues can lead to starvation Branch: master https://github.com/mongodb/mongo/commit/9dda827a3ae58beef36d53da1b55554cbd8744c4

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                34 weeks, 4 days ago
                Date of 1st Reply: