Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60204

Idea - collect the exact sequence of visited checkpoints in failed tests

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Sharding NYC

      This comes from discussion with lingzhi.deng about possible relation of PM-2248 thread liveness monitoring and SERVER-42308 that proposes to make it possible to trigger failpoints based on previously reached failpoints.

      This idea is slightly different - when in debug mode, collect the exact sequence of visited checkpoints defined by the instrumentation of thread liveness monitoring and log them if the test fails. This way we will have the exact sequence of what happened in the code before the test failed.

      The plan of PM-2248 was to dump stacks for instrumented threads in the failed tests anyway, but with timestamps. Timestamps are not sufficiently accurate and cannot be relied up to reason on extremely narrow races between multiple threads. This is just an incremental improvement to the planned feature, not much extra effort.

      Lingzhi said: "Yes, that would be helpful. It is an idea similar to undoDB but we keep track of the last x sequences ourselves. But I still think it is valuable to be able to examine possible interleaving in unittests instead of integration tests. I know STM team has a project to integrate some thread/network fuzzing into our tests. So maybe that's good enough. In that case, having the checkpoint sequence logged will be helpful."

            Assignee:
            backlog-server-sharding-nyc [DO NOT USE] Backlog - Sharding NYC
            Reporter:
            andrew.shuvalov@mongodb.com Andrew Shuvalov (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: