Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48396

SIGUSR2 stack collection closes connections

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v4.4

    Description

      I ran the repro script in SERVER-48395 while sending mongod a SIGUSR2 every 10 seconds to collect stack traces. This resulted in network errors and failed operations at the application

      Error: error doing query: failed: network error while attempting to run command 'insert' on host '127.0.0.1:27017'  :
      

      accompanied by messages from the mongo shell like

      {"t":{"$date":"2020-05-25T15:07:29.723Z"},"s":"I",  "c":"NETWORK",  "id":20120,   "ctx":"js","msg":"Trying to reconnnect","attr":{"connString":"127.0.0.1:27017 failed"}}
      {"t":{"$date":"2020-05-25T15:07:29.724Z"},"s":"I",  "c":"NETWORK",  "id":20125,   "ctx":"js","msg":"DBClientConnection failed to receive message","attr":{"connString":"127.0.0.1:27017","error":"HostUnreachable: Connection closed by peer"}}
      

      The mongod logs tell a similar story:

      The blue markers on the timeline show the points at which SIGUSR2 was received. These are accompanied by some number of connections ended (red curve) and a smaller number of connections accepted (blue curve), resulting in a net decrease of connections each time (green curve)

      I wonder if we might not be re-trying network operations when they return the EINTR that would result from SIGUSR2.

      Attachments

        1. db.log
          1.85 MB
          Bruce Lucas
        2. sigusr2.png
          79 kB
          Bruce Lucas

        Issue Links

          Activity

            People

              billy.donahue@mongodb.com Billy Donahue
              bruce.lucas@mongodb.com Bruce Lucas
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: