[GODRIVER-2401] report error when connect to mongodb inside k8s Created: 30/Apr/22  Updated: 27/Oct/23  Resolved: 25/Oct/22

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: pickup li Assignee: Matt Dale
Resolution: Gone away Votes: 0
Labels: kubernetes
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Summary

application write with go driver report error when connect to mongodb inside k8s, If change to directConnection=true will not report error but have performance issue.

Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).

go 1.18

require (
    github.com/urfave/cli/v2 v2.5.0
    go.mongodb.org/mongo-driver v1.9.0
)

MongoDB server version: 5.0.6

 

How to Reproduce

Steps to reproduce. If possible, please include a Short, Self Contained, Correct (Compilable), Example.

We setup Mongodb's stand-alone architecture in k8s, and then used the
golang application to connect to mongodb, but found a problem.
If you connect directly to mongodb, an error will be reported. The
error message is as follows:
````
2022/04/14 09:30:00 server selection error: context deadline exceeded,
_current topology: { Type: ReplicaSetNoPrimary, Servers: [

{ Addr:_ _mongo-a9f01-replica0-0-0.mongo-a9f01-replica0-headless._ _qfusion-admin:27017, Type: Unknown, Last error: connection() error_ _occurred during connection handshake: dial tcp: lookup_ _mongo-a9f01-replica0-0-0.mongo-a9f01-replica0-headless.qfusion-admin_ _on 192.168. 65.5:53: no such host }

, ] }_
````

The 192.168.65.5:53 here is the dns address inside k8s, and the
outside application should not be able to connect, so require address
from 192.168.65.5:53 to get domain name  of
mongo-a9f01-replica0-0-0.mongo-a9f01-replica0-headless.qfusion-admin
will report an error.

I googled for a solution, and verified that using
directConnection=true can solve this problem without reporting an
error.

However, when I am doing multiple concurrent data acquisition or data
insertion, I found that increasing the concurrent threads has no
effect, and the performance does not improve. I use the same
application to connect to non-k8s, and the performance can reach 600M
without direct, but when connected with directConnect=true, the
performance drops to only 100M.

I also tested c driver, no error will be reported without directConnect=true

Additional Background

Please provide any additional background information that may be helpful in diagnosing the bug.

programs to test connect and performance issue 

```

package main

import (
    "flag"
    "fmt"
    "net/http"
    "sync"
    "time"

    "go.mongodb.org/mongo-driver/bson"
    "go.mongodb.org/mongo-driver/mongo"
    "go.mongodb.org/mongo-driver/mongo/options"
)

var (
    srcUri      string
    srcDb       string
    srcColl     string
    threadCount int
    docCount    int
    batchSize   int
    pprof       bool
)

func init()

{     flag.StringVar(&srcUri, "srcuri", "mongodb://root:dbmotion#123@10.10.150.207:27717", "source mongodb uri")     flag.StringVar(&srcDb, "srcdb", "db1", "srouce db")     flag.StringVar(&srcColl, "srccoll", "t1", "srouce collection")     flag.IntVar(&threadCount, "nt", 1, "goroutine count")     flag.IntVar(&docCount, "ndoc", 1000000, "total docs to be inserted")     flag.IntVar(&batchSize, "batch", 512, "insert batch size")     flag.BoolVar(&pprof, "pprof", false, "start net/http/pprof") }

func main() {
    flag.Parse()

    fmt.Printf("%s %d threads insert into %s/%s.%s %d docs, batch size: %d\n",
        nowStr(), threadCount, srcUri, srcDb, srcColl, docCount, batchSize)

    connOpt := options.Client().ApplyURI(srcUri)
    conn, err := mongo.Connect(nil, connOpt)
    if err != nil

{         fmt.Println(err)         return     }
    defer conn.Disconnect(nil)

    if err := conn.Ping(nil, nil); err != nil {         fmt.Println(err)         return     }

    if err := conn.Database(srcDb).Collection(srcColl).Drop(nil); err != nil

{         fmt.Println(err)         return     }

    begin := time.Now()

    nDoc := docCount / threadCount
    nIns := make([]int, threadCount)

    var wg sync.WaitGroup
    wg.Add(threadCount)

    for i := 0; i < threadCount; i++ {
        pRs := &nIns[i]
        *pRs = 0
        go func()

{             insert(conn, srcDb, srcColl, nDoc, batchSize, pRs)             wg.Done()         }

()
    }

    go printStat(nIns, 10)

    if pprof {
        go func()

{             http.ListenAndServe(":6060", nil)         }

()
        fmt.Printf("pprof on :6060\n")
    }
    wg.Wait()

    elapse := time.Since(begin)
    totalDoc := 0
    for _, val := range nIns

{         totalDoc += val     }

    totalMB := 1.0 * float64(totalDoc) * 10 / 1024
    fmt.Printf("%s total insert %d docs, %3.f doc/s, %.3f MB/s\n", nowStr(),
        totalDoc, float64(totalDoc)/elapse.Seconds(), totalMB/elapse.Seconds())
}

func insert(conn *mongo.Client, dbName string, collName string, nDoc int, batchSize int, nInserted *int) {
    var binData []byte
    for i := 0; i < 1024; i++

{         binData = append(binData, byte('A'+i%26))     }

    var doc bson.D
    for i := 0; i < 10; i++

{         colName := fmt.Sprintf("c%d", i)         doc = append(doc, bson.E\{Key: colName, Value: binData}

)
    }

    coll := conn.Database(dbName).Collection(collName)
    optIns := options.InsertMany() //.SetBypassDocumentValidation(true)

    var docs []interface{}

    for i := 0; i < nDoc; i++ {
        docs = append(docs, doc)
        if len(docs) >= batchSize {
            if _, err := coll.InsertMany(nil, docs, optIns); err != nil

{                 fmt.Println(err)                 return             }

            *nInserted += len(docs)
            docs = nil
        }
    }

    if len(docs) >= batchSize {
        if _, err := coll.InsertMany(nil, docs, optIns); err != nil

{             fmt.Println(err)             return         }

        *nInserted += len(docs)
    }
}

func nowStr() string

{     return time.Now().Format("2006-01-02 15:04:05") }

func printStat(nIns []int, itvS int) {
    old := make([]int, len(nIns))

    tiker := time.NewTicker(time.Duration(itvS) * time.Second)
    for {
        copy(old, nIns)
        <-tiker.C

        totalDoc := 0
        totalMB := 0.0
        for i := 0; i < len(nIns); i++

{             deltaIns := nIns[i] - old[i]             deltaMB := float64(deltaIns * 10 / 1024)             totalDoc += deltaIns             totalMB += deltaMB             fmt.Printf("%s t-%d insert %d docs, %3.f doc/s, %.3f MB/s\n", nowStr(), i,                 deltaIns, float64(deltaIns)/float64(itvS), deltaMB/float64(itvS))         }

        fmt.Printf("%s all insert %d docs, %3.f doc/s, %.3f MB/s\n", nowStr(),
            totalDoc, float64(totalDoc)/float64(itvS), totalMB/float64(itvS))
    }
}

```

 

run the programs with './insert -srcuri="mongodb://root:dbmotion#123@10.10.150.208:27717/?directConnection=false" -nt 1  -trace'

 



 Comments   
Comment by PM Bot [ 25/Oct/22 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information.

Comment by pickup li [ 03/Sep/22 ]

Thanks very much.

I will test and check if it works

Comment by Matt Dale [ 26/Aug/22 ]

Hey pickup112@gmail.com, thanks for the extra info! Based on that information, my theory about the server identifying itself as www-a1615-replica0-0-0.www-a1615-replica0-headless.qfusion-wcy-bm2:27017 is correct. Additionally, the topology appears to be a 3-node replica set and you are connecting directly to the "primary" node.

MongoDB replica sets offer a configuration called "replica set horizons" that allow connecting to a MongoDB replica set using different DNS names. However, that feature is only supported when using one of the MongoDB Kubernetes Operators. See documentation for using replica set horizons with the the Enterprise operator here and with the Community operator here.

Are you using one of the MongoDB Kubernetes Operators to run MongoDB on Kubernetes? If so, are you able to update the MongoDB Kubernetes service definition to add a replica set horizon?

Comment by pickup li [ 09/Jul/22 ]

Yes,use mongo  to connect to the instance like this:

```

root@www-a1615-replica0-0-0:/# mongo --host 10.10.88.123 --port 2635 -u user1  -p 'User1-123' --authenticationDatabase adminMongoDB shell version v4.4.13
connecting to: mongodb://10.10.88.123:2635/?authSource=admin&compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("12692f3c-b030-44bf-8404-99c8c9055160") }
MongoDB server version: 4.4.13
www-a1615-replica0:PRIMARY> show databases;
admin   0.000GB
config  0.000GB
local   0.011GB

```

 

and db.runCommand({hello:1}) return:

```

www-a1615-replica0:PRIMARY> db.runCommand({hello:1})
{
        "topologyVersion" :

{                 "processId" : ObjectId("628b49eb018f658f59f96596"),                 "counter" : NumberLong(6)         }

,
        "hosts" : [
                "www-a1615-replica0-0-0.www-a1615-replica0-headless.qfusion-wcy-bm2:27017",
                "www-a1615-replica0-1-0.www-a1615-replica0-headless.qfusion-wcy-bm2:27017",
                "www-a1615-replica0-2-0.www-a1615-replica0-headless.qfusion-wcy-bm2:27017"
        ],
        "setName" : "www-a1615-replica0",
        "setVersion" : 1,
        "isWritablePrimary" : true,
        "secondary" : false,
        "primary" : "www-a1615-replica0-0-0.www-a1615-replica0-headless.qfusion-wcy-bm2:27017",
        "me" : "www-a1615-replica0-0-0.www-a1615-replica0-headless.qfusion-wcy-bm2:27017",
        "electionId" : ObjectId("7fffffff0000000000000001"),
        "lastWrite" : {
                "opTime" :

{                         "ts" : Timestamp(1657351209, 1),                         "t" : NumberLong(1)                 }

,
                "lastWriteDate" : ISODate("2022-07-09T07:20:09Z"),
                "majorityOpTime" :

{                         "ts" : Timestamp(1657351209, 1),                         "t" : NumberLong(1)                 }

,
                "majorityWriteDate" : ISODate("2022-07-09T07:20:09Z")
        },
        "maxBsonObjectSize" : 16777216,
        "maxMessageSizeBytes" : 48000000,
        "maxWriteBatchSize" : 100000,
        "localTime" : ISODate("2022-07-09T07:20:10.069Z"),
        "logicalSessionTimeoutMinutes" : 30,
        "connectionId" : 423114,
        "minWireVersion" : 0,
        "maxWireVersion" : 9,
        "readOnly" : false,
        "ok" : 1,
        "$clusterTime" : {
                "clusterTime" : Timestamp(1657351209, 1),
                "signature" :

{                         "hash" : BinData(0,"e2hF8xF4rQpHPl1no5226X2odBE="),                         "keyId" : NumberLong("7100850609930108932")                 }

        },
        "operationTime" : Timestamp(1657351209, 1)
}

```

 

Comment by Matt Dale [ 09/Jul/22 ]

Hey pickup112@gmail.com, are you able to connect to your MongoDB instance running in Kubernetes using either the mongosh command line tool or the legacy mongo command line tool? If so, please connect and run the following command and paste the output here:

db.runCommand({hello:1})

As you noted earlier, the connection issue doesn't seem to be isolated to just the Go Driver, so I believe this is actually an issue with server configuration, not the Go Driver. Configuring MongoDB servers to run in Kubernetes is outside of my area of expertise, so if I'm not able to help you, I'll reassign this ticket. Thanks for your patience so far!

Comment by pickup li [ 16/Jun/22 ]

Thanks for response.

  • I have my application run on a host not running in Kubernetes
  • I used a custom pod definition
Comment by Matt Dale [ 16/Jun/22 ]

Hey pickup112@gmail.com, thanks for the question and sorry about the slow response! I believe the problem is related to how MongoDB drivers update the server names based on the information in the initial server handshake. In this case, it seems like the server is identifying itself as mongo-a9f01-replica0-0-0.mongo-a9f01-replica0-headless.qfusion-admin:27017, so that's what the driver attempts to connect to. However, it also seems like that DNS name is not resolvable from the host where the client application is running. In directConnection=true mode, I believe that server name rewrite is not performed. I'm not sure why the performance is different when directConnection=true.

I have some questions to help me understand more about what's going on:

  • Where is your client application running? For example, are you trying to connect to the MongoDB server from a host in the same Kubernetes cluster as the MongoDB server, or a different Kubernetes cluster, or a host not running in Kubernetes?
  • How are you running the MongoDB standalone instance in Kubernetes? Are you using the MongoDB Community Kubernetes Operator, the MongoDB Enterprise Kubernetes Operator, or a custom pod definition?
Comment by pickup li [ 05/May/22 ]

I also tested the performance with the driver of c and java. Using directConnection=true, the single-thread performance will drop from 120MB/s to 60MB/s, and the 8-thread performance will drop from 600MB/s to 120MB/s

Generated at Thu Feb 08 08:38:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.