WT-3977. That ticket has test program debugging changes so open this ticket for continued work on the actual problem. The important diagnostic information from that ticket:
Here's what I know from the original failure. http://build.wiredtiger.com:8080/job/wiredtiger-test-recovery-stress/27680/console
- It is a simple data loss. Checkpoints are not involved at the time this record is inserted.
*The timestamp missing is 467261, record value 47699.
- The stable timestamp at the time the previous checkpoint starts is 446699. The starting checkpoint LSN is [54,5170816].
- The stable timestamp at the time the checkpoint completes is 466873. The ending checkpoint LSN is [56,9690880].
- The LSN of the equivalent oplog-table record is [56,10160128].
- The next checkpoint doesn't start until LSN [70,6295424] so we're not in the middle of any checkpoint-related processing of any kind.
- The failing thread's first record after the checkpoint is record 47661, at timestamp 466882.
- The failing thread is using prepared transactions sometimes, but did NOT use prepare on the missing record's transaction.