Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 6.14.2
Component/s: Change Streams
Labels:
- alex+
- external-user

Quarter:
- FY26Q3
Investigation Story Points:
3
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Problem

Change Streams stop working after failover, i.e. when a replication cluster's primary member becomes unavailable.

Setup

Node.js 22
npm install mongodb@6.14.2
MongoDB replica set with three members: A (primary), B (secondary), C (secondary)

Reproduction

Start the attached script with node main.mjs
Observe how change stream events are printed to the console every second
Stop the process of member A (a rs.stepDown() does not trigger the error, but it can precede stopping the process)
Almost immediately, change stream events stop printing
The application crashes after 60 seconds with a "MongoServerSelectionError" and "ECONNREFUSED 127.0.0.1:27017" (member A)

import { MongoClient } from 'mongodb';

const uri = 'mongodb://127.0.0.1:27017,127.0.0.1:27018,127.0.0.1:27019/test?replicaSet=rs0';
const client = await MongoClient.connect(uri);
const testCollection = client.db().collection('Test');

let iteration = 0;
setInterval(() => testCollection.insertOne({ i: iteration++ }), 1000);

for await (const change of testCollection.watch()) {
  console.log(`${change.operationType}: ${change.fullDocument.i}`);
}

Expectation

In the case of a failover, node-mongodb-native should keep the change stream going without interruption.
There should not be a 60 second blackout before the changeover is being noticed.

Who is impacted

This affects all customers using Change Streams.
It disrupts their users in case of a failover, e.g. when upgrading MongoDB.

Ruling out other problem sources

This is a problem with the node-mongodb-native driver because:

MongoDB itself correctly re-elects a new primary node as can be observed by rs.status() in mongosh.
A reproduction with PyMongo in Python does not show this problem: the change events keep being printed even after member A has been stopped, and even after 60 seconds.
Furthermore, with PyMongo we can restart member A and then stop member B and the events keep being printed.

Discussion

Our application keeps running normally for 60 seconds, except that no more change stream events are being published during that time and until the server then later crashes and restarts.

We were thinking about using resumeAfter, but the 60 seconds blackout doesn't really make this a viable option. Neither the maxAwaitTimeMS nor the serverSelectionTimeoutMS option did have an effect on this timeout.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

error-log.txt
7 kB
Mar 14 2025 09:39:55 AM UTC
main.mjs
0.5 kB
Mar 14 2025 09:40:39 AM UTC
main.py
0.8 kB
Mar 14 2025 09:40:44 AM UTC

is related to

DRIVERS-3138 Test more resumable non-server error cases for change streams

Backlog

Assignee:: Unassigned
Reporter:: Peter Gassner
Reviewers:: None
Votes:: 1 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Mar 14 2025 09:55:43 AM UTC
Updated:: Aug 01 2025 08:03:26 PM UTC

Details

Description

Problem

Setup

Reproduction

Expectation

Who is impacted

Ruling out other problem sources

Discussion

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates