Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-10858

Don't report errors from __verify_filefrag_chk if verify ends early due to other errors

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • 5
    • StorEng - Defined Pipeline

      It is possible to get spurious error messages during verify. For example this sequence occurred in a MongoDB test failure (on 5.0):

      [1680189197:708570][28837:0x7f2e168fd700], file:index-83--1323361982416001585.wt, WT_SESSION.verify: __wt_btree_tree_open, 637: unable to read root page from file:index-83--1323361982416001585.wt: Invalid argument
      [1680189197:708627][28837:0x7f2e168fd700], file:index-83--1323361982416001585.wt, WT_SESSION.verify: __verify_filefrag_chk, 434: file ranges never verified: 3

      In this case the second message (which would be quite serious) is not applicable.

      What is happening is that __wt_verify() starts by calling the block managers verify_start method, then it verifies the checkpoints in the file, then it calls the verify_end method.

      If there is a failure during the walking and verification of the checkpoints, then it still performs the full work of verify_end, including checking that all of the blocks in the file are either referenced or free. But since we are terminating verify early, this will almost certainly be the case.

      Thus we seen the sequence shown above. The first error is the real issue. There was a problem reading a root page for one of the checkpoints. The second message tells us that not all the blocks in the file were verified. The language, file ranges never verified allows for this scenario. We didn't verify those ranges because verify terminated early. But it is confusing.

      It would be better to divide the work of the block manager verify_end method into clean up and finishing verification, and only execute the former in this scenario.

      Note that although the error cited above occurred in 5.0, it appears that the same thing is possible in the current code.

      Admittedly, this scenario is quite rare, so this isn't a critical fix.

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            keith.smith@mongodb.com Keith Smith
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: