Uploaded image for project: 'Node.js Driver'
  1. Node.js Driver
  2. NODE-120

Investigate possible race condition

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • Planning
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Here is some additional debug information I've gathered as well.
      I suspect that if we increase some timeout somewhere to a very large number it will solve the issue. But there is a bug somewhere that causes the service to become unresponsive in these cases. If I disconnect all 4 replica servers, the code throws an error which our code then handles and retries, but if this._state.secondaries and this._state.master become null/empty at the same time, then the code does not throw an error.
      -The problem seems to occur if in one round of requests, all servers take too long to respond.
      -Norman showed us some authentication queries that took 75 seconds to respond, so our timeout of 30 seconds is most likely too low. Although it is unclear why authentication queries would take this long, perhaps this is related.
      There are 2 places where this._state.secondaries are removed, both in repl_set.js
      var _repl_set_handler = function(event, self, server)
      and
      var _handler = function(event, self, server)
      In both cases, if I set a breakpoint at the statement where entries are deleted, I cannot reproduce the bug. This suggests to me that there is some race condition where in something tries to reconnect or get readded from a different path.
      My best guess is there is an underlying bug somewhere, but that we can work around the issue in the meantime with the proper connection options.
      -Dan
      Edit: One additional piece of information I forgot to mention. When I ran this in node inspector, at a very low level, the error message being given to the handlers is
      "connection to [localhost:27021] timed out"
      It might actually have to do with the setting this.options.secondaryAcceptableLatencyMS but again I am not sure what settings are what here.

            Assignee:
            christkv Christian Amor Kvalheim
            Reporter:
            christkv Christian Amor Kvalheim
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: