Investigate continuous constant database size decreasing issue.

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • WT12.0.0
    • Affects Version/s: None
    • Component/s: Checkpoints
    • None
    • Storage Engines - Persistence
    • 204.154
    • SE Persistence backlog
    • None

      This is a derived ticket from WT-17563, Issue is by monitoring the size drop, a pattern is found that when drop_size appears, it continuous repeated multi times, and later lead the disagg database size overflow. A suspicious point here is the schema drop operation keeps or continuous re-submitted in multi checkpoints, then lead to the repeated decrease by adding drop_size by the schema table.

      The log trace (see WT-17563 for more details)

          '  ckpt #571: db_size 2523959500 -> 2436031421 (ckpt_size_delta=580381, drop_size=88508460); this ckpt: subtracted=174759 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #572: db_size 2436031421 -> 2348131104 (ckpt_size_delta=608143, drop_size=88508460); this ckpt: subtracted=165731 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #573: db_size 2348131104 -> 2260180405 (ckpt_size_delta=557761, drop_size=88508460); this ckpt: subtracted=124805 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #574: db_size 2260180405 -> 2172248635 (ckpt_size_delta=576690, drop_size=88508460); this ckpt: subtracted=138822 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #575: db_size 2172248635 -> 2084153877 (ckpt_size_delta=413702, drop_size=88508460); this ckpt: subtracted=230042 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #576: db_size 2084153877 -> 1996166920 (ckpt_size_delta=521503, drop_size=88508460); this ckpt: subtracted=131622 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #577: db_size 1996166920 -> 1908239504 (ckpt_size_delta=581044, drop_size=88508460); this ckpt: subtracted=131661 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #578: db_size 1908239504 -> 1820264332 (ckpt_size_delta=533288, drop_size=88508460); this ckpt: subtracted=124528 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #579: db_size 1820264332 -> 1732315660 (ckpt_size_delta=559788, drop_size=88508460); this ckpt: subtracted=106222 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #580: db_size 1732315660 -> 1628617408 (ckpt_size_delta=443554, drop_size=104141806); this ckpt: subtracted=188912 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #581: db_size 1628617408 -> 1462403312 (ckpt_size_delta=461094, drop_size=166675190); this ckpt: subtracted=127669 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #582: db_size 1462403312 -> 1295745844 (ckpt_size_delta=17722, drop_size=166675190); this ckpt: subtracted=86479 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #583: db_size 1295745844 -> 1129076959 (ckpt_size_delta=6305, drop_size=166675190); this ckpt: subtracted=42898 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #584: db_size 1129076959 -> 962407839 (ckpt_size_delta=6070, drop_size=166675190); this ckpt: subtracted=42900 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #585: db_size 962407839 -> 795738731 (ckpt_size_delta=6082, drop_size=166675190); this ckpt: subtracted=42887 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #586: db_size 795738731 -> 629069624 (ckpt_size_delta=6083, drop_size=166675190); this ckpt: subtracted=42886 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #587: db_size 629069624 -> 462400517 (ckpt_size_delta=6083, drop_size=166675190); this ckpt: subtracted=42886 prevented_double_subtract=0; cumulative prevented=0 (count=0)\n' +
          '  ckpt #588: db_size 462400517 
      

      Context

      • Cluster: sls-smoke-dev-aws-usw2
      • Metric details: Victoria Metrics dashboard
      • Related Jira: WT-16664
      • Discussion suggests an arithmetic bug during development, possibly an underflow where subtraction occurs without a corresponding addition, causing the metric to wrap around to the maximum int64 value.
      • The metric is not initialized to -1, so the issue is likely due to incorrect arithmetic logic.

            Assignee:
            Albert Song
            Reporter:
            Albert Song
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: