Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-88745

Investigate common "No suitable servers found" error in prod, add a trouble shooting guide

    • Type: Icon: Task Task
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Atlas Streams
    • Sprint 46

      A few stream processor in prod are hitting errors like:"No suitable servers found (`serverSelectionTryOnce` set): [socket timeout calling hello on 'mycluster-shard-00-01.zywgx.mesh.mongodb.net:30460'] [socket timeout calling hello on 'mycluster-shard-00-00.zywgx.mesh.mongodb.net:30460'] [socket timeout calling hello on 'mycluster-shard-00-02.zywgx.mesh.mongodb.net:30460']: generic server error"

      1. Investigate the root cause for this issue. Hopefully, it is a misconfiguration due to incorrect auth the user supplied, or due to the cluster no longer existing.
      2. Some starting points for investigation
        1. https://wiki.corp.mongodb.com/display/RI/Splunk#Splunk-Tracingarequestfromstreamstoanatlascluster(note:justreplacebaaswithstreams)
        2.  https://cloud.mongodb.com/admin/nds/groups
        3. This log that Erik added: https://github.com/10gen/mongohouse/pull/9217/files#diff-58a98c05fdce39bf649f3b88379ab600f499ae61c3303673e66839bdc77b3355R329
      3. Add a "trouble shooting guide" for investigating this error

      Stream Processor dashboards:

      65fc87d2c359d54e1ce8246b

      65d8f987dd88c43c22ee7f55

       

      Overall Ops Dashboard (where I initially noticed these errors):

      https://splunk.corp.mongodb.com/en-US/app/streams/overall_ops_mongostream?form.envIndex=mhouse&form.global_time.earliest=2024-04-05T10%3A14%3A00.000Z&form.global_time.latest=now

            Assignee:
            jagadish.nallapaneni@mongodb.com Jagadish Nallapaneni
            Reporter:
            sandeep.dhoot@mongodb.com Sandeep Dhoot
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: