Uploaded image for project: 'C# Driver'
  1. C# Driver
  2. CSHARP-1964

Connect on multiple threads to downed MongoDB waits for ALL threads to timeout 1 by 1

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.10
    • Component/s: Connectivity
    • Labels:
      None

      (A partner tech team has informed me of the following. Note: They acknowledge that the bug is FIXED in v2 drivers but they cannot move to that driver rev now)

      There is a bug in Mongo .NET Driver 1.10: when mongo db is not available, attempt to connect to mongo database can lock thread for a very long period of time.

      It looks like the root cause of the issue is in ChooseServerInstance and Connect methods in MongoDB.Driver.Internal.DirectMongoServerProxy class. Both methods use locks when establishing a connection to mongo database:
      lock (this._stateLock)

      { … }

      This code works fine if connection can be established immediately. But in real world scenarios MongoDB server can become unavailable. Therefore, the driver will fail by timeout, which seems to be 20 seconds by default. In other words, whatever thread calls the ChooseServerInstance method first, will hold the lock and block other threads until it fails by timeout. Then another thread puts a lock and tries to establish a connection. After it fails again, next thread comes to try do the same. Whatever thread come last in this line, will be waiting until all other threads fail to establish a connection. So if we have N threads trying to establish connection, last thread will wait for 20 * N seconds.

      There is no way to avoid this problem using Mongo .NET Driver 1.10. Connect method in MongoDB.Driver.Internal.DirectMongoServerProxy class has timeout parameter, so one would think that the problem can be mitigated by reducing timeout. But after examining code of Connect method I found that timeout parameter is not used in code and default connection timeout (20 seconds) cannot be changed. There is a bug report for this problem in Mongo JIRA. This bug report is marked as closed, but the issue was fixed in mongo driver 2.x, which we cannot use right now because of the numerous breaking changes in driver 2.x.

      Issues mentioned above were not fixed in the latest version of 1.x driver - Mongo .NET Driver 1.11.

      Steps to reproduce the problem:
      1) Simulate a clumsy connection to MongoDB instance using the clumsy tool with the following settings:

      · Filter: tcp and (tcp.DstPort == 27017 or tcp.SrcPort == 27017)

      · Lag: enabled, delay = 3000ms and Drop: enabled, Chance = 100%

      2) Run tests from the attached solution.

      3) Check output of tests. Tasks created in tests from MongoDriverLockTestv110 and MongoDriverLockTestV111 projects are queued because of the lock, the first task ends in 20 seconds, the next task ends in 40 seconds and so on. Tasks created in tests from MongoDriverLockTestV223 project are not queued because of the lock, they end almost simultaneously (mentioned bug is fixed in 2.x version of Mongo .NET Driver, but unfortunately we cannot upgrade to this driver right now).

            Assignee:
            Unassigned Unassigned
            Reporter:
            buzz.moschetti Buzz Moschetti
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: