@michaelcahill, I think there's a bug in the current code for WT_CURSOR.update.
The problem is at the end, where we do this:
That code was added as part of the extensive cursor-key/value-scope changes in b2814ef9, we added that functionality to both the insert and update cursor methods.
We later removed the call from the insert method in 2bd115e, with the comment "Change WT_CURSOR::insert to not hold a position/resources in the tree on success."
- thread 1 updates an on-page row (so no insert structure), just a linked list of updates
- thread 1 goes to sleep before the above code
- thread 2 updates the same on-page row and commits the change
- thread 1 wakes up, calls into __wt_kv_return, since cbt->ins isn't set, we look at the on-page update list. The first visible update is the update from thread 2, so we update the cursor to reference the wrong value.
Is that correct, or is this safe for some other reason?
If I'm not missing anything, it seems to me that we need to either do an "in-place" copy of the value (that's likely an extra data copy), or figure out some way for the __wt_row_modify call to return the allocated WT_UPDATE structure to the cursor code.
You understand the buffer code better than I do, what would you suggest?