Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57431

MongoDB primary instance down

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • ALL

      primary instance of replset down and unable to restart.

       log for restart:

      2021-06-04T11:27:54.148+0800 I CONTROL [main] ***** SERVER RESTARTED *****
      2021-06-04T11:27:54.157+0800 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] MongoDB starting : pid=46354 port=27077 dbpath=/data3/pretreatment/dbdata 64-bit host=wqdcsrv414.cn.prod
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] db version v4.0.13
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] git version: bda366f0b0e432ca143bc41da54d8732bd8d03c0
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] allocator: tcmalloc
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] modules: none
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] build environment:
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] distmod: rhel70
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] distarch: x86_64
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] target_arch: x86_64
      2021-06-04T11:27:54.179+0800 I CONTROL [initandlisten] options: { config: "/etc/mongodpretreatment.cnf", net:

      { bindIp: "0.0.0.0", maxIncomingConnections: 10000, port: 27077, serviceExecutor: "adaptive", wireObjectCheck: true }

      , processManagement: { fork: true, pidFilePath: "/data3/pretreatment/pid/pretreatment.pid" }, replication: { enableMajorityReadConcern: false, oplogSizeMB: 819200, replSetName: "pretreatment" }, security: { authorization: "enabled", keyFile: "/data3/pretreatment/keyfile/securitypretreatment" }, storage: { dbPath: "/data3/pretreatment/dbdata", directoryPerDB: true, engine: "wiredTiger", journal:

      { commitIntervalMs: 50, enabled: true }

      , wiredTiger: { engineConfig:

      { cacheSizeGB: 60.0, directoryForIndexes: true }

      } }, systemLog: { destination: "file", logAppend: true, logRotate: "rename", path: "/logs/pretreatment/pretreatment.log", quiet: false } }
      2021-06-04T11:27:54.180+0800 W STORAGE [initandlisten] Detected unclean shutdown - /data3/pretreatment/dbdata/mongod.lock is not empty.
      2021-06-04T11:27:54.181+0800 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
      2021-06-04T11:27:54.181+0800 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=61440M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
      2021-06-04T11:27:54.839+0800 I STORAGE [initandlisten] WiredTiger message [1622777274:839167][46354:0x7f2051264b80], txn-recover: Main recovery loop: starting at 547399/94917248 to 547400/256
      2021-06-04T11:27:54.839+0800 I STORAGE [initandlisten] WiredTiger message [1622777274:839519][46354:0x7f2051264b80], txn-recover: Recovering log 547399 through 547400
      2021-06-04T11:27:54.860+0800 I STORAGE [initandlisten] WiredTiger message [1622777274:860444][46354:0x7f2051264b80], file:local/collection/8-4992545715697342319.wt, txn-recover: Recovering log 547400 through 547400
      2021-06-04T11:27:54.933+0800 I STORAGE [initandlisten] WiredTiger message [1622777274:933146][46354:0x7f2051264b80], file:local/collection/8-4992545715697342319.wt, txn-recover: Set global recovery timestamp: 0

      2021-06-04T11:31:48.976+0800 I STORAGE [initandlisten] Placing a marker at optime Jun 4 10:51:59:4315
      2021-06-04T11:31:49.069+0800 I FTDC [initandlisten] Initializing full-time diagnostic data capture with directory '/data3/pretreatment/dbdata/diagnostic.data'
      2021-06-04T11:31:49.074+0800 I REPL [initandlisten] Rollback ID is 3
      2021-06-04T11:31:49.078+0800 I REPL [initandlisten] Recovering from an unstable checkpoint (top of oplog: { ts: Timestamp(1622775163, 2189), t: 35 }, appliedThrough: { ts: Timestamp(1622775163, 2189), t: 35 })
      2021-06-04T11:31:49.078+0800 I REPL [initandlisten] Starting recovery oplog application at the appliedThrough: { ts: Timestamp(1622775163, 2189), t: 35 }, through the top of the oplog: { ts: Timestamp(1622775163, 2189), t: 35 }
      2021-06-04T11:31:49.078+0800 I REPL [initandlisten] No oplog entries to apply for recovery. Start point is at the top of the oplog.
      2021-06-04T11:31:49.115+0800 I CONTROL [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: Replication has not yet been configured
      2021-06-04T11:31:49.115+0800 I EXECUTOR [initandlisten] No thread count configured for executor. Using number of cores / 2: 28
      2021-06-04T11:31:49.115+0800 I EXECUTOR [worker-1] Started new database worker thread 1
      2021-06-04T11:31:49.115+0800 I EXECUTOR [worker-2] Started new database worker thread 2
      2021-06-04T11:31:49.115+0800 I CONTROL [LogicalSessionCacheReap] Failed to reap transaction table: NotYetInitialized: Replication has not yet been configured

      ...

      2021-06-04T11:31:50.584+0800 I REPL [rsBackgroundSync] sync source candidate: 10.67.12.3:27077
      2021-06-04T11:31:50.584+0800 I ASIO [RS] Connecting to 10.67.12.3:27077
      2021-06-04T11:31:50.588+0800 I REPL [rsBackgroundSync] Changed sync source from empty to 10.67.12.3:27077
      2021-06-04T11:31:50.589+0800 I ASIO [RS] Connecting to 10.67.12.3:27077
      2021-06-04T11:31:50.693+0800 I NETWORK [listener] connection accepted from 10.67.12.2:34152 #26 (13 connections now open)
      2021-06-04T11:31:50.694+0800 I NETWORK [conn26] received client metadata from 10.67.12.2:34152 conn26: { driver:

      { name: "NetworkInterfaceTL", version: "4.0.13" }

      , os: { type: "Linux", name: "Oracle Linux Server release 7.8", architecture: "x86_64", version: "Kernel 4.14.35-1902.301.1.el7uek.x86_64" } }
      2021-06-04T11:31:50.701+0800 I REPL [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp(1622775163, 2189), t: 35 }. source's GTE: { ts: Timestamp(1622775638, 1), t: 36 } hashes: (3943340773125100714/-6300946020427835533)
      2021-06-04T11:31:50.702+0800 I REPL [rsBackgroundSync] Replication commit point: { ts: Timestamp(0, 0), t: -1 }
      2021-06-04T11:31:50.702+0800 I REPL [rsBackgroundSync] Rollback using the 'rollbackViaRefetch' method.
      2021-06-04T11:31:50.702+0800 I REPL [rsBackgroundSync] Canceling priority takeover callback
      2021-06-04T11:31:50.702+0800 I REPL [rsBackgroundSync] transition to ROLLBACK from SECONDARY
      2021-06-04T11:31:50.702+0800 I NETWORK [rsBackgroundSync] Skip closing connection for connection # 26
      2021-06-04T11:31:50.702+0800 I NETWORK [rsBackgroundSync] Skip closing connection for connection # 18
      2021-06-04T11:31:50.702+0800 I NETWORK [conn17] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 127.0.0.1:38140 (connection id: 17)
      2021-06-04T11:31:50.702+0800 I NETWORK [conn16] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 127.0.0.1:38138 (connection id: 16)
      2021-06-04T11:31:50.702+0800 I NETWORK [conn14] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 10.67.12.103:43794 (connection id: 14)
      2021-06-04T11:31:50.702+0800 I NETWORK [rsBackgroundSync] Skip closing connection for connection # 12
      2021-06-04T11:31:50.702+0800 I NETWORK [conn13] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 10.67.12.103:43792 (connection id: 13)
      2021-06-04T11:31:50.702+0800 I NETWORK [conn11] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 10.67.12.103:43790 (connection id: 11)
      2021-06-04T11:31:50.702+0800 I NETWORK [conn9] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 10.67.12.103:43788 (connection id: 9)
      2021-06-04T11:31:50.702+0800 I NETWORK [conn7] Error receiving request from client: CallbackCanceled: Callback was canceled. Ending connection from 10.67.12.103:43780 (connection id: 7)

      ...

      **

      2021-06-04T11:39:43.965+0800 I ROLLBACK [rsBackgroundSync] Rollback finished. The final minValid is: { ts: Timestamp(1622532457, 3), t: 22 }
      2021-06-04T11:39:43.965+0800 F ROLLBACK [rsBackgroundSync] Unable to complete rollback. A full resync may be needed: UnrecoverableRollbackError: replSet too much data to roll back.
      2021-06-04T11:39:43.965+0800 F - [rsBackgroundSync] Fatal Assertion 40507 at src/mongo/db/repl/rs_rollback.cpp 1542
      2021-06-04T11:39:43.965+0800 F - [rsBackgroundSync]

      ***aborting after fassert() failure

       

        1. metrics.2021-06-04T01-47-58Z-00000
          2.26 MB
        2. metrics.2021-06-04T03-31-50Z-00000
          78 kB
        3. pretreatment.log.gz
          10.95 MB
        4. Screen Shot 2021-06-07 at 2.25.15 PM.png
          Screen Shot 2021-06-07 at 2.25.15 PM.png
          242 kB
        5. Screen Shot 2021-06-07 at 2.26.00 PM.png
          Screen Shot 2021-06-07 at 2.26.00 PM.png
          231 kB

            Assignee:
            edwin.zhou@mongodb.com Edwin Zhou
            Reporter:
            ys.mao@homecreditcfc.cn YS Mao
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: