Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-7208

Leave table cursor in a valid state when subordinate index fails to open

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.9.0, 4.4.5
    • Affects Version/s: WT3.1.0, WT3.2.1
    • Component/s: None
    • Labels:
      None
    • Storage Engines

      Hi!

      Sometimes our application crashes on cursor put with the following stack:

      #0  0x00007fec01840c9f in __apply_idx (ctable=0xf46780, func_off=120, skip_immutable=false) at .../src/thirdparty/wiredtiger/3.2.1/src/src/cursor/cur_table.c:159
      #1  __curtable_insert (cursor=0xf46780) at .../src/thirdparty/wiredtiger/3.2.1/src/src/cursor/cur_table.c:536
      #2  <our code>
      ...
      

      We tracked the issue down to corrupted index files. While index file corruption most likely happened due to our operational mistakes, we think the crash here can be avoided.

      The scenario that leads to crash is this:
      1. New cursor is opened and put is called. WT is trying to open indices here:

      static int
      __curtable_open_indices(WT_CURSOR_TABLE *ctable)
      {
          WT_CURSOR **cp, *primary;
          WT_SESSION_IMPL *session;
          WT_TABLE *table;
          u_int i;
      
          session = (WT_SESSION_IMPL *)ctable->iface.session;
          table = ctable->table;
      
          WT_RET(__wt_schema_open_indices(session, table));
          if (table->nindices == 0 || ctable->idx_cursors != NULL)
              return (0);
      
          /* Check for bulk cursors. */
          primary = *ctable->cg_cursors;
          if (F_ISSET(primary, WT_CURSTD_BULK))
              WT_RET_MSG(session, ENOTSUP, "Bulk load is not supported for tables with indices");
      
          WT_RET(__wt_calloc_def(session, table->nindices, &ctable->idx_cursors));
          for (i = 0, cp = ctable->idx_cursors; i < table->nindices; i++, cp++)
              WT_RET(
                __wt_open_cursor(session, table->indices[i]->source, &ctable->iface, ctable->cfg, cp));
          return (0);
      }
      

      2. Some of the indices fail to open. Error is returned to caller, and ctable->idx_cursors is allocated, but is not properly filled.
      3. As the error returned is not WT_PANIC, we reuse the cursor. WT tries to open the indices again, but due to this check in __curtable_open_indices WT thinks indices are already open (table->nindices are not zero, and ctable->idx_cursors was allocated on the previous step):

          WT_RET(__wt_schema_open_indices(session, table));
          if (table->nindices == 0 || ctable->idx_cursors != NULL)
              return (0);
      

      4. When WT tries to access ctable->idx_cursors later in __apply_idx, it crashes because some of the cursors are NULL.

      We'd expect cursor put either to return WT_PANIC on the first failed operation, or return the index-related error again on subsequent operations.

      I've attached example program to reproduce the issue. Index corruption is imitated by removing the index file.

            Assignee:
            donald.anderson@mongodb.com Donald Anderson
            Reporter:
            alexander.babayants@itiviti.com Alexander Babayants
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: