Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-44055

All secondary crashed in SessionUpdateTracker and cannot recovery

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.3
    • Component/s: Replication
    • None
    • Fully Compatible
    • ALL
    • Hide

      Still don't how to reproduce

      Show
      Still don't how to reproduce

      One secondary crashed

      2019-10-16T18:51:49.855+0800 F -        [rsSync-0] Fatal Assertion 50843 at src/mongo/db/repl/session_update_tracker.cpp 69
          2019-10-16T18:51:49.855+0800 F -        [rsSync-0]
      
          ***aborting after fassert() failure
      

      Then another secondary crashed after change sync source to primary 

      2019-10-16T18:52:21.381+0800 I REPL     [rsBackgroundSync] Changed sync source from empty to XX.XX.XX.XX:3097(PrimaryNode IP)
      2019-10-16T18:52:21.381+0800 I ASIO     [RS] Connecting to 11.203.27.243:3097
      2019-10-16T18:52:21.780+0800 F -        [rsSync-0] Fatal Assertion 50843 at src/mongo/db/repl/session_update_tracker.cpp 69
      2019-10-16T18:52:21.780+0800 F -        [rsSync-0]

      The secondary cannot be restarted,and generate very wiered errors,but the _tmp database is created long before, and no one delete the directory.  Then no primary can be elected in the replica-set with three members,cause the service unavailable for a long time.

      2019-10-16T18:54:51.635+0800 I STORAGE  [initandlisten] WiredTiger message [1571223291:635501][41642:0x2b90d4829d00], txn-recover: Recovering log 752 through 753
      2019-10-16T18:54:51.730+0800 I STORAGE  [initandlisten] WiredTiger message [1571223291:730545][41642:0x2b90d4829d00], file:local/collection-2--8933994092597418142.wt, txn-recover: Recovering log 753 through 753
      2019-10-16T18:54:51.796+0800 I STORAGE  [initandlisten] WiredTiger message [1571223291:796255][41642:0x2b90d4829d00], file:local/collection-2--8933994092597418142.wt, txn-recover: Set global recovery timestamp: 5da6f62b00000008
      2019-10-16T18:54:51.814+0800 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1571223083, 8)
      2019-10-16T18:54:51.814+0800 I STORAGE  [initandlisten] Triggering the first stable checkpoint. Initial Data: Timestamp(1571223083, 8) PrevStable: Timestamp(0, 0) CurrStable: Timestamp(1571223083, 8)
      2019-10-16T18:54:52.044+0800 E STORAGE  [initandlisten] WiredTiger error (2) [1571223292:44873][41642:0x2b90d4829d00], file:_tmp/collection-6--6434994499321235876.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /home/mongo/mongo3103/data/_tmp/collection-6--6434994499321235876.wt: handle-open: open: No such file or directory Raw: [1571223292:44873][41642:0x2b90d4829d00], file:_tmp/collection-6--6434994499321235876.wt, WT_SESSION.open_cursor: __posix_open_file, 715: /home/mongo/mongo3103/data/_tmp/collection-6--6434994499321235876.wt: handle-open: open: No such file or directory
      2019-10-16T18:54:52.044+0800 E STORAGE  [initandlisten] Failed to get the cursor for uri: table:_tmp/collection-6--6434994499321235876
      2019-10-16T18:54:52.044+0800 E STORAGE  [initandlisten] This may be due to missing data files. Please read the documentation for starting MongoDB with --repair here: http://dochub.mongodb.org/core/repair
      2019-10-16T18:54:52.044+0800 F -        [initandlisten] Fatal Assertion 50883 at src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp 538
      2019-10-16T18:54:52.044+0800 F -        [initandlisten]
      
      

      The oplog cause the error in primary maybe as follows

      Timestamp(1571223107, 5603), "t" : NumberLong(2), "h" : NumberLong("4179556560629599034"), "v" : 2, "op" : "n", "ns" : "db.coll", "ui" : BinData(4,"uKDv6Dw1RBKxq+ewcCqntg=="), "fromMigrate" : true, "o2" : { "lsid" : { "id" : BinData(4,"b9QQ1SZVT2mh4vBoD4u/LA=="), "uid" : BinData(0,"Y5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fg=") }, "txnNumber" : NumberLong(2), "op" : "i", "ns" : "db.coll", "ui" : BinData(4,"uKDv6Dw1RBKxq+ewcCqntg=="), "o" : { "id" : "xx", "platformUId" : "yy", "players" : { "xx" : { "id" : NumberLong(123456), "sId" : NumberLong(1), "loginAt" : ISODate("2019-xx-xxT10:51:47.522Z") } }, "regTime" : ISODate("2019-10-16T10:51:47.522Z"), "id" : NumberLong(3096698), "platform" : "hortor", "platformId" : NumberLong(778902834), "openId" : "xx", "shareCode" : "yy", "channel" : "zz" }, "ts" : Timestamp(1571223107, 4431), "t" : NumberLong(2), "h" : NumberLong("2385553688009329962"), "v" : NumberLong(2), "wall" : ISODate("2019-10-16T10:51:47.520Z"), "stmtId" : 0, "prevOpTime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) } }, "wall" : ISODate("2019-10-16T10:51:47.520Z"), "lsid" : { "id" : BinData(4,"b9QQ1SZVT2mh4vBoD4u/LA=="), "uid" : BinData(0,"Y5mrDaxi8gv8RmdTsQ+1j7fmkr7JUsabhNmXAheU0fg=") }, "txnNumber" : NumberLong(2), "stmtId" : 0, "prevOpTime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) }, "o" : { "$sessionMigrateInfo" : 1 } }
      

       

            Assignee:
            dmitry.agranat@mongodb.com Dmitry Agranat
            Reporter:
            zyd_com@126.com Zhang Youdong
            Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: