Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-1992

Set 0 for batchsize for initial change stream

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Component/s: Change Streams
    • Labels:
      None
    • Needed
    • $i18n.getText("admin.common.words.hide")
      Key Status/Resolution FixVersion
      CDRIVER-4357 Won't Do
      CXX-2489 Won't Do
      CSHARP-4137 Won't Do
      GODRIVER-2379 Won't Do
      JAVA-4573 Won't Do
      NODE-4183 Won't Do
      MOTOR-931 Won't Do
      PYTHON-3223 Won't Do
      PHPLIB-840 Won't Do
      RUBY-2955 Won't Do
      RUST-1266 Won't Do
      SWIFT-1544 Won't Do
      $i18n.getText("admin.common.words.show")
      #scriptField, #scriptField *{ border: 1px solid black; } #scriptField{ border-collapse: collapse; } #scriptField td { text-align: center; /* Center-align text in table cells */ } #scriptField td.key { text-align: left; /* Left-align text in the Key column */ } #scriptField a { text-decoration: none; /* Remove underlines from links */ border: none; /* Remove border from links */ } /* Add green background color to cells with FixVersion */ #scriptField td.hasFixVersion { background-color: #00FF00; /* Green color code */ } /* Center-align the first row headers */ #scriptField th { text-align: center; } Key Status/Resolution FixVersion CDRIVER-4357 Won't Do CXX-2489 Won't Do CSHARP-4137 Won't Do GODRIVER-2379 Won't Do JAVA-4573 Won't Do NODE-4183 Won't Do MOTOR-931 Won't Do PYTHON-3223 Won't Do PHPLIB-840 Won't Do RUBY-2955 Won't Do RUST-1266 Won't Do SWIFT-1544 Won't Do

      Summary

      What is the problem or use case, what are we trying to achieve?

      Add option to set independent batch sizes for changestreams. This would be a new option in addition to BatchSize currently on changestream options to set the cursor batchsize on subsquent getMores.

      This will allow us to set the initial aggregate batch size to 0 in the cases where the aggregation takes a large amount of time. We need to be able to set a batch size for the subsequent getMores however, or we will never return anything with a batchSize:0 for both.

       

      the drivers don’t have a good way to auto-teminate a query until we have established a cursor, which only happens after the initial aggregate completes. 

       

      So in a case where we abandon the initial aggregate, or where the client crashes, it just keeps running to completion on the server. If we establish a cursor with batchSize:0, then when the driver abandons the cursor it will automatically issue an explicit killCursors to the server.

      For an example on the go driver, here's what we implemented on Realm's forked go driver repo: https://github.com/mongodb-forks/mongo-go-driver/pull/8/files

      Motivation

      Who is the affected end user?

      Who are the stakeholders?

      Users who use changestreams. In this specific case, users who use Realm triggers and sync.

      How does this affect the end user?

      Are they blocked? Are they annoyed? Are they confused?

      I'll speak to how end users are impacted in the sense of triggers, but I imagine there's a larger use case here. For users who have a fairly large oplog with a match expression that is very selective, the trigger will fail while opening up the changestream because we a timeout limit we enforce outside of the driver. We utilize the collection.Watch functionality, which will attempt to open the changestream and return a cursor to us. 

      In the case of a timeout on the initial opening, the driver can't kill the aggregation because a cursor wasn't established. So, the trigger will suspend, and the aggregation will continue to run on the user's cluster. The user can then restart the trigger, causing the above scenario to happen again by kicking off a new aggregation, and ultimately timing out. This is impactful to the user's cluster because the aggregation will run to completion causing unnecessary resources to be used.

      How likely is it that this problem or use case will occur?

      Main path? Edge case?

      this is an edge case for customers who have very large oplogs.

      If the problem does occur, what are the consequences and how severe are they?

      Minor annoyance at a log message? Performance concern? Outage/unavailability? Failover can't complete?

      Since the driver can't kill the old operations that were kicked off due to not retrieving a cursor back, the severity largely depends on how many times a changestream is retried and failed. In the triggers case, a user can restart their trigger many times which can lead to a pretty severe consequence. 

      Is this issue urgent?

      Does this ticket have a required timeline? What is it?

      this isn't urgent since Realm has made this change in our forked go driver repo. 

      Is this ticket required by a downstream team?

      Needed by e.g. Atlas, Shell, Compass?

      Realm

      Is this ticket only for tests?

      Is this ticket have any functional impact, or is it just test improvements?

      this ticket is not only for tests.

            Assignee:
            boris.dogadov@mongodb.com Boris Dogadov
            Reporter:
            tim.sedgwick@mongodb.com Tim Sedgwick
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: