Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-6179

Intermittent MongoNetworkTimeoutError when accessing MongoDB Atlas from Vercel Serverless Functions

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Unknown Unknown
    • None
    • Affects Version/s: 6.3.0
    • Component/s: None
    • Labels:
    • 2
    • Not Needed
    • Not Needed

      Vercel users reported intermittent 500 errors when connecting to their MongoDB Atlas clusters in production environments. This is due to:

      MongoServerSelectionError: connection timed out

      Context

      Hi MongoDB team.

      I'm a Vercel employee raising this in behalf of multiple customers who have been affected by this issue for 3+ months. Vercel's engineering team have investigated to no avail. 

      This issue relates to the following Github issues: 10671 / 5708 / 4297

      There are similar reports within the Mongo community too.

      Additional discussion and a video can be found in our #shared-mongodb slack channel.

      Issue overview

      Developers integrated MongoDB Atlas with Vercel Serverless Functions following the nextjs-with-mongodb example. Particularly mongodb.js{}

      Was connectTimeoutMS hit?

      Normally this means that the `connectTimeoutMS` limit has been reached. However all requests encountering this error are 200ms or less which is far less than the 30s default value. Customers have also tried adjusting their timeouts however this didn't change error behaviour and frequency.

      Here is a related comment on this topic.

      Do connection pool settings need to be adjusted?

      All affected customers attempted to change their connection pool settings. This did not seem to make a difference.

      How often does it happen?

      There does not seem to be a specific pattern for when the errors will occur. My theory is that it happens when there have been multiple requests within the same 30s period however we have limited evidence to support this.

      What is the full error message?

      I've attached some sample runtime logs here: logs.zip

      Does updating node-mongodb-native help?

      We tested using v6.3.0 on 2024-03-01 and unfortunately the issue was not resolved. Other users tried upgrading too.

      I acknowledge there may be newer releases since this investigation has been ongoing for several months. Perhaps it is addressed in newer releases. 

      Why do the timeouts happen almost instantly?

      While we don't have a definitive answer, here are some notes from one of our engineers:

      The running theory is that this is due to the way lambda handles timers. Floating promises and timers can schedule their callback after the function invocation is done but before the execution context is frozen (regardless of callbackWaitsForEmptyEventLoop). If the execution context is re-used, the next invocation runs those callbacks immediately. That's how a 30s timeout error can occur in a <200ms function invocation.

      One way to fix this would be to make sure that all timers are cleared and all promises are awaited by the end of a function invocation. Is there a way to ensure this in the nodejs driver?

      Unfortunately, because this behaviour is so timing dependent it is very hard to reproduce consistently.

      And here are some replies on this topic:

      Timeouts always run first in Node.js but we explicitly postpone handling the notification of a timeout expiry on sockets by one event loop tick which should allow the driver to read from the socket that has hellos waiting for it avoiding a timeout error

      We also introduced a flag to control the monitor switching between streaming/polling, in polling mode when the lambda wakes up it should have a pending expired timeout that triggers a cluster check, rather than a "cluster unhealthy" interpretation of not getting a hello within its timeout

      The flag mentioned above is the serverMonitoringMode URI option (see SDAM spec) which should be automatically set to "poll" if Lambda is detected (or if Vercel is detected).

      Ultimately the challenge is likely going to be that things are technically working as designed

      Does it just affect apps using nextjs-with-mongodb example syntax?

      While most customer reports contained similar syntax there are reports in the community where developers are instead using the Vercel/MongoDB Atlas integration

      Does the "How to Integrate MongoDB Into Your Next.js App" guide help?

      Some users reported following this guide. Some mentioned that the error rate reduced but was not fully resolved. Related discussion.

      User Impact

      We have ~7 support cases from customers. There have been many more affected developers discussing this on Github and the MongoDB forums.

      Dependencies

      Vercel Serverless Functions

      node-mongodb-native v6.3.0 (perhaps other versions too)

            Assignee:
            bailey.pearson@mongodb.com Bailey Pearson
            Reporter:
            aldo.schumann@vercel.com Aldo Schumann
            Votes:
            3 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: