Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-1698

Reconfiguring eviction opens a race with WT_CONNECTION::close

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.9.2
    • Labels:

      Description

      After the eviction.threads_max setting is increased, the eviction server may resize its array of worker contexts and allocate more sessions. This happens asynchronously in the eviction server thread because it owns the eviction worker contexts.

      The problem with this is that we assert when opening a session that we are not in the process of closing the connection. This has been the source of a number of bugs in the past where internal threads can race with connection close if the connection is only open for a short period.

      One possible fix would be to set a flag when the reconfigure happens and have the eviction server clear the flag once it has processed the setting change. WT_CONNECTION::close would have to wait for the flag to be cleared before we start asserting that no new sessions can be opened.

        Issue Links

          Activity

          Hide
          keith.bostic Keith Bostic added a comment -

          @michaelcahill: I don't see any place we assert when opening a session we are not in the process of closing the connection, is that still the case?

          Since connection close waits on the servers to exit, would it be sufficient for connection close to shut out reconfiguration, then wait on the server threads?

          Show
          keith.bostic Keith Bostic added a comment - @michaelcahill: I don't see any place we assert when opening a session we are not in the process of closing the connection, is that still the case? Since connection close waits on the servers to exit, would it be sufficient for connection close to shut out reconfiguration, then wait on the server threads?
          Hide
          michael.cahill Michael Cahill added a comment -

          @keithbostic, we do check whether a connection is closing before allowing a session to open, here https://github.com/wiredtiger/wiredtiger/blob/develop/src/session/session_api.c#L1031

          I would like to keep that if possible, because it has caught a bunch of problems with server threads in the past.

          The problem isn't connection close racing with reconfiguration. The problem is that sometime after a reconfiguration, the eviction server may notice the change and reallocate some memory, open some sessions and start some threads. We don't do the realloc during reconfigure because the eviction server owns that memory.

          If we could shut down the eviction server early in close, we may be able to address this, but evicting whole trees currently waits on the eviction server (to clear the eviction walk point), and I'm hesitant to make connection close too much of a special case. I'm sure this problem is solvable, and there is some way to wait during close for the eviction server to settle down, but I don't see a simple solution.

          Show
          michael.cahill Michael Cahill added a comment - @keithbostic, we do check whether a connection is closing before allowing a session to open, here https://github.com/wiredtiger/wiredtiger/blob/develop/src/session/session_api.c#L1031 I would like to keep that if possible, because it has caught a bunch of problems with server threads in the past. The problem isn't connection close racing with reconfiguration. The problem is that sometime after a reconfiguration, the eviction server may notice the change and reallocate some memory, open some sessions and start some threads. We don't do the realloc during reconfigure because the eviction server owns that memory. If we could shut down the eviction server early in close, we may be able to address this, but evicting whole trees currently waits on the eviction server (to clear the eviction walk point), and I'm hesitant to make connection close too much of a special case. I'm sure this problem is solvable, and there is some way to wait during close for the eviction server to settle down, but I don't see a simple solution.
          Hide
          keith.bostic Keith Bostic added a comment -

          @michaelcahill, I was confused because I expected it to be a test of WT_CONN_CLOSING and it's testing WT_CONN_SERVER_RUN. Is there any reason we aren't testing WT_CONN_SERVER_RUN everywhere, that is, does WT_CONN_CLOSING serve any special purpose?

          I don't see anything in *wt_async_destroy or *wt_lsm_manager_destroy that makes me think session open couldn't test WT_CONN_SERVER_RUN?

          diff --git a/dist/flags.py b/dist/flags.py
          index f1eb6b2..b4171a1 100644
          --- a/dist/flags.py
          +++ b/dist/flags.py
          @@ -87,7 +87,6 @@ flags = {
               'conn' : [
                   'CONN_CACHE_POOL',
                   'CONN_CKPT_SYNC',
          -        'CONN_CLOSING',
                   'CONN_EVICTION_RUN',
                   'CONN_LEAK_MEMORY',
                   'CONN_LOG_SERVER_RUN',
          diff --git a/src/conn/conn_dhandle.c b/src/conn/conn_dhandle.c
          index 7756158..d920b0e 100644
          --- a/src/conn/conn_dhandle.c
          +++ b/src/conn/conn_dhandle.c
          @@ -375,7 +375,7 @@ __conn_btree_open(
                      F_ISSET(dhandle, WT_DHANDLE_EXCLUSIVE) &&
                      !LF_ISSET(WT_DHANDLE_LOCK_ONLY));
           
          -       WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_CLOSING));
          +       WT_ASSERT(session, F_ISSET(S2C(session), WT_CONN_SERVER_RUN));
           
                  /*
                   * If the handle is already open, it has to be closed so it can be
          diff --git a/src/conn/conn_open.c b/src/conn/conn_open.c
          index 0a3d35a..da6b605 100644
          --- a/src/conn/conn_open.c
          +++ b/src/conn/conn_open.c
          @@ -110,9 +110,6 @@ __wt_connection_close(WT_CONNECTION_IMPL *conn)
                  F_CLR(conn, WT_CONN_SERVER_RUN);
                  WT_TRET(__wt_async_destroy(session));
                  WT_TRET(__wt_lsm_manager_destroy(session));
          -
          -       F_SET(conn, WT_CONN_CLOSING);
          -
                  WT_TRET(__wt_checkpoint_server_destroy(session));
                  WT_TRET(__wt_statlog_destroy(session, 1));
                  WT_TRET(__wt_sweep_destroy(session));
          

          Show
          keith.bostic Keith Bostic added a comment - @michaelcahill, I was confused because I expected it to be a test of WT_CONN_CLOSING and it's testing WT_CONN_SERVER_RUN . Is there any reason we aren't testing WT_CONN_SERVER_RUN everywhere, that is, does WT_CONN_CLOSING serve any special purpose? I don't see anything in *wt_async_destroy or *wt_lsm_manager_destroy that makes me think session open couldn't test WT_CONN_SERVER_RUN ? diff --git a/dist/flags.py b/dist/flags.py index f1eb6b2..b4171a1 100644 --- a/dist/flags.py +++ b/dist/flags.py @@ -87,7 +87,6 @@ flags = { 'conn' : [ 'CONN_CACHE_POOL', 'CONN_CKPT_SYNC', - 'CONN_CLOSING', 'CONN_EVICTION_RUN', 'CONN_LEAK_MEMORY', 'CONN_LOG_SERVER_RUN', diff --git a/src/conn/conn_dhandle.c b/src/conn/conn_dhandle.c index 7756158..d920b0e 100644 --- a/src/conn/conn_dhandle.c +++ b/src/conn/conn_dhandle.c @@ -375,7 +375,7 @@ __conn_btree_open( F_ISSET(dhandle, WT_DHANDLE_EXCLUSIVE) && !LF_ISSET(WT_DHANDLE_LOCK_ONLY)); - WT_ASSERT(session, !F_ISSET(S2C(session), WT_CONN_CLOSING)); + WT_ASSERT(session, F_ISSET(S2C(session), WT_CONN_SERVER_RUN)); /* * If the handle is already open, it has to be closed so it can be diff --git a/src/conn/conn_open.c b/src/conn/conn_open.c index 0a3d35a..da6b605 100644 --- a/src/conn/conn_open.c +++ b/src/conn/conn_open.c @@ -110,9 +110,6 @@ __wt_connection_close(WT_CONNECTION_IMPL *conn) F_CLR(conn, WT_CONN_SERVER_RUN); WT_TRET(__wt_async_destroy(session)); WT_TRET(__wt_lsm_manager_destroy(session)); - - F_SET(conn, WT_CONN_CLOSING); - WT_TRET(__wt_checkpoint_server_destroy(session)); WT_TRET(__wt_statlog_destroy(session, 1)); WT_TRET(__wt_sweep_destroy(session));
          Hide
          keith.bostic Keith Bostic added a comment -

          I think this problem can occur in two other ways: the async code opens an internal session as part of WT_CONNECTION.async_new_op, and LSM will start the LSM manager thread/session when an LSM tree is opened, in other words, those two code paths can attempt to create an internal session when WT_CONNECTION.close is in process. It shouldn't happen in a well-written application, of course, but that's not terribly reassuring.

          Show
          keith.bostic Keith Bostic added a comment - I think this problem can occur in two other ways: the async code opens an internal session as part of WT_CONNECTION.async_new_op , and LSM will start the LSM manager thread/session when an LSM tree is opened, in other words, those two code paths can attempt to create an internal session when WT_CONNECTION.close is in process. It shouldn't happen in a well-written application, of course, but that's not terribly reassuring.
          Hide
          keith.bostic Keith Bostic added a comment -

          FTR, I just saw what I think is a related core dump with WT-1613's CONFIG:

          (gdb) where
          #0  __evict_worker (arg=0x116bb30) at src/evict/evict_lru.c:391
          WT-1  0x00007ff05c1bbf18 in start_thread () from /lib64/libpthread.so.0
          WT-2  0x00007ff05bad6b2d in clone () from /lib64/libc.so.6
          (gdb) allstack
           
          Thread 3 (Thread 0x7ff05b1d8740 (LWP 28191)):
          #0  0x00007ff05c1bcfb7 in pthread_join () from /lib64/libpthread.so.0
          WT-1  0x000000000042e453 in __wt_thread_join (session=session@entry=0x114e510, 
              tid=<optimized out>) at src/os_posix/os_thread.c:36
          WT-2  0x000000000041a4e8 in __wt_evict_destroy (session=session@entry=0x114e510)
              at src/evict/evict_lru.c:361
          WT-3  0x0000000000411cf0 in __wt_connection_close (conn=conn@entry=0x1185370)
              at src/conn/conn_open.c:157
          WT-4  0x000000000040c2d6 in __conn_close (wt_conn=0x1185370, config=0x0)
              at src/conn/conn_api.c:723
          WT-5  0x000000000040b794 in wts_close () at wts.c:406
          WT-6  0x0000000000404005 in main (argc=<optimized out>, argv=<optimized out>)
              at t.c:222
           
          Thread 2 (Thread 0x7ff05b1d6700 (LWP 24434)):
          #0  0x00007ff05c1bcfb7 in pthread_join () from /lib64/libpthread.so.0
          WT-1  0x000000000042e453 in __wt_thread_join (session=session@entry=0x114e7d0, 
              tid=<optimized out>) at src/os_posix/os_thread.c:36
          WT-2  0x000000000041b9a4 in __evict_server (arg=0x114e7d0)
              at src/evict/evict_lru.c:185
          WT-3  0x00007ff05c1bbf18 in start_thread () from /lib64/libpthread.so.0
          WT-4  0x00007ff05bad6b2d in clone () from /lib64/libc.so.6
           
          Thread 1 (Thread 0x7ff04eff5700 (LWP 31593)):
          #0  __evict_worker (arg=0x116bb30) at src/evict/evict_lru.c:391
          WT-1  0x00007ff05c1bbf18 in start_thread () from /lib64/libpthread.so.0
          WT-2  0x00007ff05bad6b2d in clone () from /lib64/libc.so.6
          

          Thread 3 has already pthread-joined all of the eviction workers, and Thread 2 is attempting to pthread-join an eviction worker.

          Show
          keith.bostic Keith Bostic added a comment - FTR, I just saw what I think is a related core dump with WT-1613 's CONFIG: (gdb) where #0 __evict_worker (arg=0x116bb30) at src/evict/evict_lru.c:391 WT-1 0x00007ff05c1bbf18 in start_thread () from /lib64/libpthread.so.0 WT-2 0x00007ff05bad6b2d in clone () from /lib64/libc.so.6 (gdb) allstack   Thread 3 (Thread 0x7ff05b1d8740 (LWP 28191)): #0 0x00007ff05c1bcfb7 in pthread_join () from /lib64/libpthread.so.0 WT-1 0x000000000042e453 in __wt_thread_join (session=session@entry=0x114e510, tid=<optimized out>) at src/os_posix/os_thread.c:36 WT-2 0x000000000041a4e8 in __wt_evict_destroy (session=session@entry=0x114e510) at src/evict/evict_lru.c:361 WT-3 0x0000000000411cf0 in __wt_connection_close (conn=conn@entry=0x1185370) at src/conn/conn_open.c:157 WT-4 0x000000000040c2d6 in __conn_close (wt_conn=0x1185370, config=0x0) at src/conn/conn_api.c:723 WT-5 0x000000000040b794 in wts_close () at wts.c:406 WT-6 0x0000000000404005 in main (argc=<optimized out>, argv=<optimized out>) at t.c:222   Thread 2 (Thread 0x7ff05b1d6700 (LWP 24434)): #0 0x00007ff05c1bcfb7 in pthread_join () from /lib64/libpthread.so.0 WT-1 0x000000000042e453 in __wt_thread_join (session=session@entry=0x114e7d0, tid=<optimized out>) at src/os_posix/os_thread.c:36 WT-2 0x000000000041b9a4 in __evict_server (arg=0x114e7d0) at src/evict/evict_lru.c:185 WT-3 0x00007ff05c1bbf18 in start_thread () from /lib64/libpthread.so.0 WT-4 0x00007ff05bad6b2d in clone () from /lib64/libc.so.6   Thread 1 (Thread 0x7ff04eff5700 (LWP 31593)): #0 __evict_worker (arg=0x116bb30) at src/evict/evict_lru.c:391 WT-1 0x00007ff05c1bbf18 in start_thread () from /lib64/libpthread.so.0 WT-2 0x00007ff05bad6b2d in clone () from /lib64/libc.so.6 Thread 3 has already pthread-joined all of the eviction workers, and Thread 2 is attempting to pthread-join an eviction worker.
          Hide
          michael.cahill Michael Cahill added a comment -

          The fix in WT-1792 should fix the stack above, but not the original problem: we could still hit the assertion about opening a session concurrent with connection close after reconfiguring the number of eviction threads.

          The reason for the WT_CONN_CLOSING is that async threads and LSM threads can open handles. We clear WT_CONN_SERVER_RUN to signal to all server threads to shut down. We set WT_CONN_CLOSING after we're sure we shouldn't open more data handles. It isn't pretty, but they can't be combined.

          I'm not concerned about applications that do async operations or LSM operations concurrent with WT_CONNECTION::close – they're clearly violating the API by accessing handles after they are invalidated.

          We could complicate close further by setting some other flag once we're sure all the threads that open sessions have finished, but that's more ugliness like WT_CONN_CLOSING. Or we could just get rid of these assertions and wrap parts of close in api_lock to avoid racing with server threads opening sessions (though we'd also have to set WT_SESSION_INTERNAL inside the lock when opening a session).

          Show
          michael.cahill Michael Cahill added a comment - The fix in WT-1792 should fix the stack above, but not the original problem: we could still hit the assertion about opening a session concurrent with connection close after reconfiguring the number of eviction threads. The reason for the WT_CONN_CLOSING is that async threads and LSM threads can open handles. We clear WT_CONN_SERVER_RUN to signal to all server threads to shut down. We set WT_CONN_CLOSING after we're sure we shouldn't open more data handles. It isn't pretty, but they can't be combined. I'm not concerned about applications that do async operations or LSM operations concurrent with WT_CONNECTION::close – they're clearly violating the API by accessing handles after they are invalidated. We could complicate close further by setting some other flag once we're sure all the threads that open sessions have finished, but that's more ugliness like WT_CONN_CLOSING . Or we could just get rid of these assertions and wrap parts of close in api_lock to avoid racing with server threads opening sessions (though we'd also have to set WT_SESSION_INTERNAL inside the lock when opening a session).
          Hide
          ramon.fernandez Ramon Fernandez added a comment -

          Additional ticket information from GitHub

          This ticket was referenced in the following commits:
          Show
          ramon.fernandez Ramon Fernandez added a comment - Additional ticket information from GitHub This ticket was referenced in the following commits: 8d918f0ef8588056ecf729e72ffdd8bc0a79fd6c
          Hide
          keith.bostic Keith Bostic added a comment - - edited

          Michael Cahill, I just noticed WT-1698 is related to WT-3155 (for which I pushed a pull request a couple of days ago), but raises some different issues. I'll pull WT-1698 into my sprint, hopefully we can fix both of them.

          Summarizing WT-1698:

          • The original issue was concern that eviction could re-configure and attempt to allocate additional sessions while the application is closing the connection. Because eviction does the allocation asynchronously to the re-configuration, it's possible to hit this with a well-behaved application, that is, one that re-configures and then calls close. We have a `DIAGNOSTIC` assert in `__open_session `, that we're not in connection close when opening a session, so it's possible for a well-behaved application to trigger an assertion.
          • You said you wanted to keep the assertion in `__open_session` ("This has been the source of a number of bugs in the past where internal threads can race with connection close if the connection is only open for a short period.")
          • I said there are at least two other places this problem can occur, the LSM manager and async operations.
          • You commented about where in the code we set the `WT_CONN_CLOSING` flag: it's after we shouldn't open more data handles, and `__wt_conn_btree_open` asserts we're not in the close path.

          The PR I submitted (#3344), makes related changes:

          • I moved the setting of the `WT_CONN_CLOSING` flag up in the close process and removed the assertion in `__wt_conn_btree_open` that we weren't in the process of closing.
          • I changed the `DIAGNOSTIC` assertion in `__open_session` to be a return of `EBUSY`, without any associated error message.

          My thinking was that it's OK for asynchronous threads to be in the open-file and open-session paths during close – they're running asynchronously and they can be in any normal code path until we shut them down. That said, I understand the diagnostic value, and expected that returning `EBUSY` out of `open_session` would trigger an error message we would see during testing.

          Reviewing this all today, it occurred to me I'd introduced a problem, because `EBUSY` from session-allocation was likely to make its caller panic, which could race with the close and prevent normal shutdown.

          However, I also figured out that LSM doesn't give its worker threads sessions, and both eviction and the async code allocate additional sessions in the reconfiguration path, so neither of them will open new sessions during close, at least in a well-behaved application. (I think async has always been that way, it's eviction that changed in August of 2016.)

          In summary, at this point we could put the assertion back into `__open_session`, because we don't open new internal sessions in the close path. Given the possibility of an application calling close and other WiredTiger APIs concurrently, we could even go further, and panic in that case.

          Thoughts?

          Show
          keith.bostic Keith Bostic added a comment - - edited Michael Cahill , I just noticed WT-1698 is related to WT-3155 (for which I pushed a pull request a couple of days ago), but raises some different issues. I'll pull WT-1698 into my sprint, hopefully we can fix both of them. Summarizing WT-1698 : The original issue was concern that eviction could re-configure and attempt to allocate additional sessions while the application is closing the connection. Because eviction does the allocation asynchronously to the re-configuration, it's possible to hit this with a well-behaved application, that is, one that re-configures and then calls close. We have a `DIAGNOSTIC` assert in `__open_session `, that we're not in connection close when opening a session, so it's possible for a well-behaved application to trigger an assertion. You said you wanted to keep the assertion in `__open_session` ("This has been the source of a number of bugs in the past where internal threads can race with connection close if the connection is only open for a short period.") I said there are at least two other places this problem can occur, the LSM manager and async operations. You commented about where in the code we set the `WT_CONN_CLOSING` flag: it's after we shouldn't open more data handles, and `__wt_conn_btree_open` asserts we're not in the close path. The PR I submitted ( #3344 ), makes related changes: I moved the setting of the `WT_CONN_CLOSING` flag up in the close process and removed the assertion in `__wt_conn_btree_open` that we weren't in the process of closing. I changed the `DIAGNOSTIC` assertion in `__open_session` to be a return of `EBUSY`, without any associated error message. My thinking was that it's OK for asynchronous threads to be in the open-file and open-session paths during close – they're running asynchronously and they can be in any normal code path until we shut them down. That said, I understand the diagnostic value, and expected that returning `EBUSY` out of `open_session` would trigger an error message we would see during testing. Reviewing this all today, it occurred to me I'd introduced a problem, because `EBUSY` from session-allocation was likely to make its caller panic, which could race with the close and prevent normal shutdown. However, I also figured out that LSM doesn't give its worker threads sessions, and both eviction and the async code allocate additional sessions in the reconfiguration path, so neither of them will open new sessions during close, at least in a well-behaved application. (I think async has always been that way, it's eviction that changed in August of 2016.) In summary, at this point we could put the assertion back into `__open_session`, because we don't open new internal sessions in the close path. Given the possibility of an application calling close and other WiredTiger APIs concurrently, we could even go further, and panic in that case. Thoughts?
          Hide
          keith.bostic Keith Bostic added a comment -

          Update: I was wrong, LSM does give its worker threads sessions, but those sessions are allocated on startup, not on demand, which means LSM does not have the problem described in WT-1698.

          I now believe that none of the server subsystems allocate sessions asynchronously to reconfiguration, which means a well-behaved application (one not calling connection-close and connection-reconfiguration at the same time), should not have a race.

          The specific subsystem called out in this ticket, eviction, was switched to the new thread-group code (see WT-2846, PR #2973), and at the same time, switched from allocating sessions asynchronously to allocating sessions as part of reconfiguration.

          In summary, I believe that WT-1698 can be closed immediately.

          cc: Alexander Gorrod, Michael Cahill

          Show
          keith.bostic Keith Bostic added a comment - Update: I was wrong, LSM does give its worker threads sessions, but those sessions are allocated on startup, not on demand, which means LSM does not have the problem described in WT-1698 . I now believe that none of the server subsystems allocate sessions asynchronously to reconfiguration, which means a well-behaved application (one not calling connection-close and connection-reconfiguration at the same time), should not have a race. The specific subsystem called out in this ticket, eviction, was switched to the new thread-group code (see WT-2846 , PR #2973 ), and at the same time, switched from allocating sessions asynchronously to allocating sessions as part of reconfiguration. In summary, I believe that WT-1698 can be closed immediately. cc: Alexander Gorrod , Michael Cahill

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                12 weeks, 5 days ago
                Date of 1st Reply:

                  Agile