Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-56307

The chunk migration "convergence algorithm" is very primitive

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2.15, 4.4.7, 4.0.26, 5.0.0-rc3, 5.1.0-rc0
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Backport Requested:
      v5.0, v4.4, v4.2, v4.0
    • Sprint:
      Sharding EMEA 2021-05-17
    • Linked BF Score:
      172

      Description

      The chunk migration "convergence algorithm" is the logic which the donor and participant implement in order for the donor to decide when to enter the critical section and block writes.

      The current implementation relies on the recipient being capable of catching up on every modification occurred during the migration and reaching the STEADY state to allow the donor to enter the critical section.

      It has been detected that under situations of heavy load such condition may be never met, as the rate of incoming modifications is bigger than what can be transferred to the recipient.

      Potetial fixes are:

      (1) The donor sends to the recipient some measure of how much mods are left, so it can enter the steady state based on some delta, rather than zero.

      (2) The donor decides to enter the critical section not based on whether the recipient has decided to enter steady-state (which includes a wait for majority)
      (Option 2 might not be appropriate, though because it adds the recipient's majority wait under the critical section)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              paolo.polato Paolo Polato
              Reporter:
              kaloian.manassiev Kaloian Manassiev
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: