Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT2.7.0
    • Labels:
      None

      Description

      We're seeing occasional failures to test_txn11, where the symptom is during recovery the metadata LSN is in log file 1, but that file has already been archived:

      http://build.wiredtiger.com:8080/job/wiredtiger-spinlock-gcc/ws/build_posix/WT_TEST/test_txn11.test_txn11.test_ops.0/stderr.txt/*view*/

      There have been some changes to how we track the LSN for checkpoints recently: can you please take a look?

        Issue Links

          Activity

          Hide
          sue.loverso Sue LoVerso added a comment -

          I am able to reproduce this. It is not a new problem. I am able to reproduce this in a pre-logging-changes tree as well, once I understood the circumstances and forced them to be the same on the old tree.

          What is happening is that we perform a full checkpoint at, say, LSN 1,N. The metadata and the turtle file both have 1,N in them. We write the actual checkpoint record containing 1,N at 1,N+1. The record at N+1 happens to be the last record that will go into log file 1. When the connection closes, we force a checkpoint record to be written and that gets written at 2,128. That updates logging's internal checkpoint LSN to 2,128. Between the time of writing that clean close checkpoint record and shutting down the logging subsystem, the archive thread runs and removes log file 1. All metadata entries and the turtle file have checkpoint LSNs in log file 1 and when we reopen and run recovery starting from 1,N it doesn't exist and we get the error.

          I can see several different ways to fix this and am considering which one is the best.

          Show
          sue.loverso Sue LoVerso added a comment - I am able to reproduce this. It is not a new problem. I am able to reproduce this in a pre-logging-changes tree as well, once I understood the circumstances and forced them to be the same on the old tree. What is happening is that we perform a full checkpoint at, say, LSN 1,N. The metadata and the turtle file both have 1,N in them. We write the actual checkpoint record containing 1,N at 1,N+1. The record at N+1 happens to be the last record that will go into log file 1. When the connection closes, we force a checkpoint record to be written and that gets written at 2,128. That updates logging's internal checkpoint LSN to 2,128. Between the time of writing that clean close checkpoint record and shutting down the logging subsystem, the archive thread runs and removes log file 1. All metadata entries and the turtle file have checkpoint LSNs in log file 1 and when we reopen and run recovery starting from 1,N it doesn't exist and we get the error. I can see several different ways to fix this and am considering which one is the best.
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'}

          Message: WT-2101 Don't update the logging ckpt_lsn on clean shutdown. It can
          race with archive. Metadata LSNs may not be updated on clean shutdown.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/3a431d10f6aadc20b36f8fc8fe98349980f57ea6

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'} Message: WT-2101 Don't update the logging ckpt_lsn on clean shutdown. It can race with archive. Metadata LSNs may not be updated on clean shutdown. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/3a431d10f6aadc20b36f8fc8fe98349980f57ea6
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Merge pull request #2190 from wiredtiger/wt-2101

          WT-2101 Don't update the logging ckpt_lsn on clean shutdown.
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/41db2ee37d11b0a885fc883dbcb2a92394e598d1

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Merge pull request #2190 from wiredtiger/wt-2101 WT-2101 Don't update the logging ckpt_lsn on clean shutdown. Branch: develop https://github.com/wiredtiger/wiredtiger/commit/41db2ee37d11b0a885fc883dbcb2a92394e598d1
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: WT-2101 Don't update the logging ckpt_lsn on clean shutdown.

          Merge pull request #2190 from wiredtiger/wt-2101

          (cherry picked from commit 41db2ee37d11b0a885fc883dbcb2a92394e598d1)
          Branch: mongodb-3.0
          https://github.com/wiredtiger/wiredtiger/commit/ffb29c7576ed162fe9ba119fd6d1936da6acf385

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: WT-2101 Don't update the logging ckpt_lsn on clean shutdown. Merge pull request #2190 from wiredtiger/wt-2101 (cherry picked from commit 41db2ee37d11b0a885fc883dbcb2a92394e598d1) Branch: mongodb-3.0 https://github.com/wiredtiger/wiredtiger/commit/ffb29c7576ed162fe9ba119fd6d1936da6acf385

            People

            • Assignee:
              sue.loverso Sue LoVerso
              Reporter:
              michael.cahill Michael Cahill
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: