Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79338

Expand metrics found in reports to provide more coverage signal to MongoDB Teams

    XMLWordPrintableJSON

Details

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Minor - P4 Minor - P4
    • None
    • None
    • None
    • Correctness

    Description

      1. The number of times after a leader election a node goes into member state ROLLBACK. The server also logs some other metrics related to replication rollback like how many operations are being rolled back: https://github.com/mongodb/mongo/blob/fb679bde06827e98f7c55272a83c754959a3ffd6/src/mongo/db/repl/rollback_impl.cpp#L1529
      2. The number of times a chunk successfully migrates. https://github.com/mongodb/mongo/blob/fb679bde06827e98f7c55272a83c754959a3ffd6/src/mongo/db/s/migration_source_manager.cpp#L634-L635
      3. The number of times a node has 0 read tickets or write tickets available for operations. This kind of metric probably requires post-processing the contents of the diagnostic.data/ directory. It is something we can defer until exploring more into deadlock scenarios. https://jira.mongodb.org/browse/SERVER-75205 is the type of bug I'm thinking of to know "would it be possible for Antithesis to hit this?"
       

      Attachments

        Activity

          People

            devprod-correctness-team@mongodb.com [DO NOT ASSIGN] Backlog - DevProd Correctness
            javier.arguello@antithesis.com javi Arguello
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated: