Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92749

Mongodb infinite loop during TLS connection read

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 7.0.12
    • Component/s: None
    • None
    • Networking & Observability
    • ALL
    • Security 2024-08-05, Security 2024-08-19, Networking & Obs 2024-09-02, Networking & Obs 2024-09-16, Networking & Obs 2024-09-30, Networking & Obs 2024-10-14, Networking & Obs 2024-10-28

      OS: CentOS Stream release 9 (Linux xxx-test-db-2.azr.etn 5.14.0-472.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 27 20:15:53 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux)

      Mongo version: 7.0.12 (the same behaviour for build from your yum repository - mongodb-org-server-7.0.12-1.el9.x86_64 and our custom build with debug info)

      HW: Azure VM Standard F16s v2 (16 vcpus, 32 GiB memory)

      Clients:

      • Mongodb-exporter-0.39.0-0.el9.x86_64
      • mongo-java-client-4.6.1

      Current configuration:

      systemLog:
        destination: file
        logAppend: true
        logRotate: reopen
        path: /var/log/mongodb/mongod.log
       
      storage:
        dbPath: /var/lib/mongo
        engine: wiredTiger
        directoryPerDB: false
       
      processManagement:
        fork: false
        pidFilePath: /var/run/mongodb/mongod.pid
        timeZoneInfo: /usr/share/zoneinfo
       
      # network interfaces
      net:
        bindIp: 0.0.0.0  # Listen to local interface only, comment to listen on all interfaces.
        tls:
          mode: requireTLS
          certificateKeyFile: /etc/ssl/xxx-test-db-2.azr.etn.pem
          CAFile: /etc/ssl/xxx-test-db-2.azr.etn.CA.pem
          allowConnectionsWithoutCertificates: true
          allowInvalidCertificates: true
          logVersions: TLS1_0,TLS1_1,TLS1_2,TLS1_3
       
        ipv6: false
        maxIncomingConnections: 500
        port: 27017
       
      replication:
        oplogSizeMB: 1024
        replSetName: repl-xxx-test
       
      security:
        authorization: enabled
        keyFile: /etc/mongo.key
      

       

      We have identified a bug during reading a TLS stream. Mongo is trying to read from malfunctioning TLS stream (error SSL_ERROR_SYSCALL) and then the connection thread gets into infinite loop. The bug happens when Mongo runs either in replica set (it can happen on both primary and secondary nodes) or as a single instance.

      The system shows increased load, but there is no significant IO activity. The load is generated by connection threads. Those threads are not present in db.currentOp() status.

      Stacktrace (pmp_strace.log) shows that these threads are mainly present in ssl handling parts of the code (ie. functions like ERR_clear_error, SSL_read) or memory free function (tc_free - called from ERR_clear_error function).

      Our investigation starts in function engine:perform. We identified that ssl error returns status 0x5 (SSL_ERROR_SYSCALL, as described here https://www.openssl.org/docs/man3.0/man3/SSL_get_error.html it means "Some non-recoverable, fatal I/O error occurred.").

      The asio::error_code we get varies in different connection threads, examples:

      After match condition ssl_error == SSL_ERROR_SYSCALL the function returns 0 (want_nothing)

      Next interesting part is in asio::detail::read_buffer_sequence function. Mongo detects that the buffer is not empty.

      And then goes to read_some function.

      This function returns us to engine::perform again.

       

      Buffer contains following data

        1. image-2024-07-23-14-45-08-540.png
          image-2024-07-23-14-45-08-540.png
          53 kB
        2. image-2024-07-23-14-45-55-346.png
          image-2024-07-23-14-45-55-346.png
          199 kB
        3. image-2024-07-23-14-46-19-211.png
          image-2024-07-23-14-46-19-211.png
          447 kB
        4. image-2024-07-23-14-46-31-313.png
          image-2024-07-23-14-46-31-313.png
          90 kB
        5. image-2024-07-23-14-46-54-100.png
          image-2024-07-23-14-46-54-100.png
          177 kB
        6. image-2024-07-23-14-47-10-146.png
          image-2024-07-23-14-47-10-146.png
          186 kB
        7. image-2024-07-23-14-47-34-257.png
          image-2024-07-23-14-47-34-257.png
          169 kB
        8. image-2024-07-23-14-48-08-635.png
          image-2024-07-23-14-48-08-635.png
          258 kB
        9. pmp_strace.log
          2.61 MB
        10. screenshot-1.png
          screenshot-1.png
          77 kB
        11. stack.html
          60 kB
        12. stack.txt
          60 kB
        13. image-2024-07-24-17-49-03-416.png
          image-2024-07-24-17-49-03-416.png
          111 kB

            Assignee:
            amirsaman.memaripour@mongodb.com Amirsaman Memaripour
            Reporter:
            petr.medonos@etnetera.cz Petr Medonos
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: