Sometimes our application crashes on cursor put with the following stack:
We tracked the issue down to corrupted index files. While index file corruption most likely happened due to our operational mistakes, we think the crash here can be avoided.
The scenario that leads to crash is this:
1. New cursor is opened and put is called. WT is trying to open indices here:
2. Some of the indices fail to open. Error is returned to caller, and ctable->idx_cursors is allocated, but is not properly filled.
3. As the error returned is not WT_PANIC, we reuse the cursor. WT tries to open the indices again, but due to this check in __curtable_open_indices WT thinks indices are already open (table->nindices are not zero, and ctable->idx_cursors was allocated on the previous step):
4. When WT tries to access ctable->idx_cursors later in __apply_idx, it crashes because some of the cursors are NULL.
We'd expect cursor put either to return WT_PANIC on the first failed operation, or return the index-related error again on subsequent operations.
I've attached example program to reproduce the issue. Index corruption is imitated by removing the index file.