Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-2787

Arbiter connection permanently broken after initiating the replica set

    • Not Needed
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

      Summary

      In my scenario, I have an arbiter from which I collect metrics using the Percona MongoDB exporter (https://github.com/percona/mongodb_exporter). The exporter uses the Go driver to establish a direct connection to the arbiter and execute commands like getDiagnosticData and replSetGetStatus every 10 seconds.

      The exporter connects to the arbiter as soon as it starts and reuses the same connection throughout its entire life. When the mongod instance and the MongoDB exporter start at the same time (which is typically the case as they reside in the same Kubernetes pod), the connection becomes permanently broken. The following error is returned by the Go driver on every attempt to execute any command:

      Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-arbiter:27017, Type: Unknown, Last error: (NotPrimaryOrSecondary) node is not in primary or recovering state }, ] }
      

      Restarting only the MongoDB exporter or disabling the global connection pool (via the --no-mongodb.global-conn-pool flag) fixes the problem.

      This is particularly easy to reproduce when the exporter connects to the soon-to-be arbiter instance before initiating the replica set. In such case, the exporter is initially able to execute commands but starts failing once the replica set is initiated.

      MongoDB driver version: v1.10.3 (the issue doesn't reproduce with v1.3.2 used by older versions of the MongoDB exporter)

      MongoDB server version: v4.4 or later

      Topology: replica set or sharded cluster + arbiter

      How to Reproduce

      Reproducibility: 100%

      1. Create the following prometheus.yml file:

      global: 
        scrape_interval: 10s
        scrape_timeout: 9s
      
      scrape_configs: 
        - job_name: mongo-0
          static_configs: 
            - targets: 
                - exporter-0:9216
        - job_name: mongo-1
          static_configs: 
            - targets: 
                - exporter-1:9216
        - job_name: mongo-arbiter
          static_configs: 
            - targets: 
                - exporter-arbiter:9216
      

      2. Create the following docker-compose.yml file:

      version: '3.8'
      
      services: 
        mongo-0:
          container_name: mongo-0
          image: mongo:4.4
          networks: 
            - mongo
          entrypoint: ["/usr/bin/mongod", "--bind_ip_all", "--replSet", "rs"]
        mongo-1:
          container_name: mongo-1
          image: mongo:4.4
          networks: 
            - mongo
          entrypoint: ["/usr/bin/mongod", "--bind_ip_all", "--replSet", "rs"]
        mongo-arbiter: 
          container_name: mongo-arbiter
          image: mongo:4.4
          networks: 
            - mongo
          entrypoint: ["/usr/bin/mongod", "--bind_ip_all", "--replSet", "rs"]
        exporter-0:
          container_name: exporter-0
          image: percona/mongodb_exporter:0.37.0
          networks: 
            - mongo
          command: ["--mongodb.uri=mongodb://mongo-0:27017/", "--mongodb.direct-connect", "--collector.diagnosticdata", "--mongodb.global-conn-pool", "--log.level=debug"]
        exporter-1:
          container_name: exporter-1
          image: percona/mongodb_exporter:0.37.0
          networks: 
            - mongo
          command: ["--mongodb.uri=mongodb://mongo-1:27017/", "--mongodb.direct-connect", "--collector.diagnosticdata", "--mongodb.global-conn-pool", "--log.level=debug"]
        exporter-arbiter: 
          container_name: exporter-arbiter
          image: percona/mongodb_exporter:0.37.0
          networks: 
            - mongo
          command: ["--mongodb.uri=mongodb://mongo-arbiter:27017/", "--mongodb.direct-connect", "--collector.diagnosticdata", "--mongodb.global-conn-pool", "--log.level=debug"]
        victoriametrics: 
          container_name: victoriametrics
          image: victoriametrics/victoria-metrics:latest
          networks: 
            - mongo
          ports: 
            - 8428:8428
          volumes: 
            - ./prometheus.yml:/etc/prometheus/prometheus.yml
          command: ["-promscrape.config", "/etc/prometheus/prometheus.yml"]
      
      networks: 
        mongo: 
          driver: bridge
      

      3. Start the containers in the background:

      docker compose up -d
      

      4. Monitor the exporter-arbiter container logs for 30-40 seconds:

      docker logs -f exporter-arbiter
      

      Note that the getDiagnosticData command result is emitted every 10 seconds.

      5. Initiate the replica set:

      cat <<EOF | docker exec -i mongo-0 mongo
      rs.initiate({
          "_id": "rs",
          "members": [
              {"_id": 0, "host": "mongo-0:27017"},
              {"_id": 1, "host": "mongo-1:27017"},
              {"_id": 2, "host": "mongo-arbiter:27017", "arbiterOnly": true}
          ]
      });
      EOF
      

      6. Monitor the exporter-arbiter container logs for another 30-40 seconds.

      docker logs -f exporter-arbiter
      

      Note that the following entries are now emitted instead of the command results:

      time="2023-03-30T18:49:31Z" level=warning msg="cannot load topology labels: cannot check if the instance is an arbiter: server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-arbiter:27017, Type: Unknown, Last error: (NotPrimaryOrSecondary) node is not in primary or recovering state }, ] }"
      time="2023-03-30T18:49:31Z" level=error msg="Registry - Cannot get node type to check if this is a mongos : server selection error: context canceled, current topology: { Type: Single, Servers: [{ Addr: mongo-arbiter:27017, Type: Unknown, Last error: (NotPrimaryOrSecondary) node is not in primary or recovering state }, ] }"
      

      7. Restart the exporter-arbiter container:

      docker restart exporter-arbiter
      

      8. Monitor the exporter-arbiter container logs for another 30-40 seconds:

      docker logs -f exporter-arbiter
      

      Note that the command results are emitted again.

      9. Stop and remove the containers:

      docker compose down
      

      Additional Background

      Use the --no-mongodb.global-conn-pool flag instead of the --mongodb.global-conn-pool flag in the example above to disable the global connection pool. The issue doesn't reproduce when a new connection is used each time.

      I forked the v0.11.2 branch of the MongoDB exporter which uses the Go driver v1.3.2 which doesn't have this issue. After upgrading to v1.11.3, the issue started to appear.

            Assignee:
            matt.dale@mongodb.com Matt Dale
            Reporter:
            aleksander.polak@tietoevry.com Aleksander Polak
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: