Uploaded image for project: 'MongoDB Database Tools'
  1. MongoDB Database Tools
  2. TOOLS-2813

Ops Manager unable to get backups working

    • Type: Icon: Question Question
    • Resolution: Declined
    • Priority: Icon: Minor - P4 Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None

      Hi, We have been unable to get backups working. Can you help identify what the issue might be? Steps and questions below.

      Steps taken to enable backup:

      • Two clusters called "EMEA" (production data) and "Backup" (to store backups) both running 4.2.10-Enterprise, FCV 4.2
      • There are backup agents running on all Automation agents in both clusters
      • A "backup" user was created with root privileges (we were not sure what privileges were appropriate)
      • The backup daemons look like this in the admin page
      • "Continuous Backup" was enabled for the EMEA cluster. The only option we were given was what storage engine to use, although the input was set to WiredTiger and disabled so we couldn't change it anyway. Image below indicates the dialogue:

       

      Issues observed

      There are no snapshots for EMEA

      There is a warning for the EMEA backup

      The backup job in the admin page is stuck at "WT checkpoint" and has not moved in many days

      This is the journal head for the job
      Journal Head

      {{{}}
        "broken"false,
        "rollback"false,
        "nextSnapshot": {"$timestamp": {
          "t": 1613040951,
          "i": 1
        }},
        "groupId": {"$oid""5d4d30fe0fe66405326fc065"},
        "oplogStoreType""oplogStore",
        "rsId""EMEA",
        "theft": {"eligible"false},
        "mongodOptions": {},
        "filterlist": {
          "type""blacklist",
          "list": []
        },
        "oplogMinTTLSeconds": 176400,
        "schedule": {
          "reference": 1613040951,
          "rules": [
            {{{}}
              "duration": 172800,
              "interval": 86400
            },
            {{{}}
              "duration": 1209600,
              "interval": 604800
            },
            {{{}}
              "duration": 2419200,
              "interval": 2419200
            }
          ]
        },
        "pitWindowSeconds": 86400,
        "machine": {
          "boundBy""bgrid",
          "owned": {
            "head""/cs/giraffe/mongoservice/",
            "machine""sgld9007341"
          },
          "bound": {
            "head""/cs/giraffe/mongoservice/",
            "machine""sgld9007341"
          }
        },
        "syncStore": {"id""oplog1"},
        "oplogStore": {
          "namespace""5d4d30fe0fe66405326fc065.oplog_EMEA",
          "id""WalrusOplog"
        },
        "mongodVersion""4.2.10",
        "workingOn"false,
        "_id": {"$oid""60250cfd948439e73807806a"},
        "tag": {
          "takesEffectMs": 0,
          "name""unassigned"
        },
        "state": {"action""WT checkpoint"},
        "wtBackup": {"checkpointingTarget": {
          "hostname""gbld9014893.eu.hedani.net:27017",
          "sessionKey""6022c3f8d90ed79a42a7da35"
        }},
        "blockstore": {
          "phase""A",
          "hashFunction""SHA256",
          "lastGroomedMS": 1613040893792,
          "lastTrackedMS": 1613300164680,
          "id""SphereBlockstore"
        },
        "snapshot": {"snapshotStoreType""blockstore"},
        "mongod": {
          "encryption": {"enabled"false},
          "storageEngine""wiredTiger"
        }
      }

      Questions we have

      • Why isn't the backup working?
      • Are we on the 4.2 backup process or is it trying to use the old 4.0 one, as it seems to be trying to use the backup daemon and an oplogstore?
      • What are the minimal permissions for the backup user set up for the blockstore to have?
      • Is it correct that Backup agents are run on every Automation agent in both source and target clusters?
      • If we are meant to be using the Backup daemon, why does it have high disk usage and where should it be located (near the source/target clusters?)

       

            Assignee:
            Unassigned Unassigned
            Reporter:
            jonathan.rogers@credit-suisse.com Jonathan Rogers
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: