[SERVER-75539] Server doesn't boot up in Kubernetes environment Created: 31/Mar/23  Updated: 03/Apr/23  Resolved: 03/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sebastian Laskawiec Assignee: Unassigned
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-75583 Provide the official images for other... Closed
Operating System: ALL
Participants:

 Description   

Problem Statement

When adopting the Community Server image on Kubernetes, I noticed that the database doesn't boot up with the standard configuration file and the server remains silent.

Steps to reproduce

As a prerequisites for the steps described below, please install Kind (doesn't matter what version) and boot it up. Once Kind is running, proceed with further steps.

1. Get any Kubernetes environment up and running. I'm using Kind:

kind v0.14.0 go1.18.2 darwin/arm64

2. Create the following reproducer in your cluster using kubectl apply -f reproducer.yaml

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: example-mongodb-svc
  name: example-mongodb
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: example-mongodb-svc
  serviceName: example-mongodb-svc
  template:
    metadata:
      labels:
        app: example-mongodb-svc
    spec:
      containers:
      - command:
        - /bin/sh
        - -c
        - mongod --config /etc/mongod.conf --syslog
        args:
          - ""
        env:
        - name: AGENT_STATUS_FILEPATH
          value: /healthstatus/agent-health-status.json
        # If this image is specified, nothing will happen. The whole process just hangs.
        image: docker.io/mongodb/mongodb-community-server:6.0.5-ubi8
        # This image however, boots up correctly. It fails with
        # │ {"t":{"$date":"2023-04-03T09:56:13.265Z"},"s":"F",  "c":"CONTROL",  "id":20574,   "ctx":"-","msg":"Error during global initialization","attr":{"error":{"code":2,"codeName":"BadValue","errmsg":"logpath cannot be empty if supplied"}}}
        # which is absolutely correct in this case.
        # image: quay.io/sebastian_laskawiec_mongodb/mongodb-community-server:6.0.5-ubi8
        imagePullPolicy: Always
        name: mongod
        resources:
          limits:
            cpu: "1"
            memory: 500M
          requests:
            cpu: 500m
            memory: 400M
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /data
          name: data-volume
        - mountPath: /var/lib/mongodb-mms-automation/authentication
          name: example-mongodb-keyfile
        - mountPath: /healthstatus
          name: healthstatus
        - mountPath: /hooks
          name: hooks
        - mountPath: /var/log/mongodb-mms-automation
          name: logs-volume
        - mountPath: /tmp
          name: tmp
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      volumes:
      - emptyDir: {}
        name: agent-scripts
      - name: automation-config
        secret:
          defaultMode: 416
          secretName: example-mongodb-config
      - emptyDir: {}
        name: example-mongodb-keyfile
      - emptyDir: {}
        name: healthstatus
      - emptyDir: {}
        name: hooks
      - emptyDir: {}
        name: tmp
  updateStrategy:
    type: RollingUpdate
  volumeClaimTemplates:
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: data-volume
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10G
      volumeMode: Filesystem
    status:
      phase: Pending
  - apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      creationTimestamp: null
      name: logs-volume
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 2G
      volumeMode: Filesystem

The reproducer has been created based on what the MongoDB Community Operator creates and trimmed down manually.

3. Notice that the server does not respond. You can check more with the following:

# Check if the server is really running
$ kubectl debug -it example-mongodb-0 --image=busybox:1.28 --target=mongod --share-processes
$$ ps
PID   USER     TIME  COMMAND
    1 998       5:15 {mongod} /usr/bin/qemu-x86_64 /usr/bin/mongod mongod --config /etc/mongod.conf --logpath
 
# Check if there's anything in logs:
$ kubectl logs example-mongodb-0
## Empty
$ kubectl exec -it example-mongodb-0 sh
$$ cat /var/log/mongodb/mongod.log
## Empty
# Check the connection using localhost exception:
$$ mongosh
Current Mongosh Log ID:	6426cb70cccab1556ade749f
Connecting to:		mongodb://127.0.0.1:27017/?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+1.8.0
MongoNetworkError: connect ECONNREFUSED 127.0.0.1:27017

The server doesn't seem to be running.



 Comments   
Comment by Sebastian Laskawiec [ 03/Apr/23 ]

After diagnosing this issue further, it turned out that the official Server images are built only for x86 architecture. Since I was running Kind on M1 Mac (arm64), the process was hanging. For now, we can apply an easy workaround and switch our development environment to x86 (Kops on AWS).

I created SERVER-75583 for providing Docker-community parity architectures.

Comment by Sebastian Laskawiec [ 31/Mar/23 ]

One thing that might be worth mentioning, Kube uses random UID/GIDs by default. Later on, we'll be looking into fixing the to 2000 but for this reproducer, uid/gid is this:

uid=998(mongod) gid=996(mongod) groups=996(mongod)

Generated at Thu Feb 08 06:30:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.