Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-6858

Change Stream stops working after failover

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 6.14.2
    • Component/s: Change Streams
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Problem

      Change Streams stop working after failover, i.e. when a replication cluster's primary member becomes unavailable.

      Setup

      1. Node.js 22
      2. npm install mongodb@6.14.2
      3. MongoDB replica set with three members: A (primary), B (secondary), C (secondary)

      Reproduction

      1. Start the attached script with node main.mjs
      2. Observe how change stream events are printed to the console every second
      3. Stop the process of member A (a rs.stepDown() does not trigger the error, but it can precede stopping the process)
      4. Almost immediately, change stream events stop printing
      5. The application crashes after 60 seconds with a "MongoServerSelectionError" and "ECONNREFUSED 127.0.0.1:27017" (member A)

       

      import { MongoClient } from 'mongodb';
      
      const uri = 'mongodb://127.0.0.1:27017,127.0.0.1:27018,127.0.0.1:27019/test?replicaSet=rs0';
      const client = await MongoClient.connect(uri);
      const testCollection = client.db().collection('Test');
      
      let iteration = 0;
      setInterval(() => testCollection.insertOne({ i: iteration++ }), 1000);
      
      for await (const change of testCollection.watch()) {
        console.log(`${change.operationType}: ${change.fullDocument.i}`);
      }
      

      Expectation

      • In the case of a failover, node-mongodb-native should keep the change stream going without interruption.
      • There should not be a 60 second blackout before the changeover is being noticed.

      Who is impacted

      • This affects all customers using Change Streams.
      • It disrupts their users in case of a failover, e.g. when upgrading MongoDB.

      Ruling out other problem sources

      This is a problem with the node-mongodb-native driver because:

      1. MongoDB itself correctly re-elects a new primary node as can be observed by rs.status() in mongosh.
      2. A reproduction with PyMongo in Python does not show this problem: the change events keep being printed even after member A has been stopped, and even after 60 seconds.
      3. Furthermore, with PyMongo we can restart member A and then stop member B and the events keep being printed.

      Discussion

      Our application keeps running normally for 60 seconds, except that no more change stream events are being published during that time and until the server then later crashes and restarts.

      We were thinking about using resumeAfter, but the 60 seconds blackout doesn't really make this a viable option. Neither the maxAwaitTimeMS nor the serverSelectionTimeoutMS option did have an effect on this timeout.

        1. error-log.txt
          7 kB
        2. main.mjs
          0.5 kB
        3. main.py
          0.8 kB

            Assignee:
            neal.beeken@mongodb.com Neal Beeken
            Reporter:
            peter.gassner@inf.ethz.ch Peter Gassner
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              None
              None
              None
              None