Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31478

Certain unrecoverable errors don't throw UnrecoverableRollbackError, causing rollback to retry indefinitely

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.5.13
    • Component/s: Replication
    • Labels:
    • Replication
    • ALL
    • v3.6
    • Repl 2017-12-04
    • 15

      Certain errors are not transient but don't bubble up as UnrecoverableRollbackErrors, so rollback's continuously retry logic will hit the same error every time. The error here seems to be related to new testing of applyOps, since all other commands have valid DBNames.

      Please see the attached file for a repro.

      d20003| 2017-10-09T18:08:53.972-0400 I REPL     [rsBackgroundSync] scheduling fetcher to read remote oplog on robert-macbook-pro.local:20001 starting at filter: { ts: { $gte: Timestamp 1507586707000|12 } }
      d20003| 2017-10-09T18:08:53.973-0400 I REPL     [rsBackgroundSync] Starting rollback due to OplogStartMissing: Our last op time fetched: { ts: Timestamp 1507586707000|12, t: 1 }. source's GTE: { ts: Timestamp 1507586718000|1, t: 2 } hashes: (-3080319662268009517/5545839650998093505)
      d20003| 2017-10-09T18:08:53.973-0400 I REPL     [rsBackgroundSync] Replication commit point: { ts: Timestamp 1507586706000|4, t: 1 }
      d20003| 2017-10-09T18:08:53.973-0400 I REPL     [rsBackgroundSync] Rollback using the 'rollbackViaRefetch' method due to startup server parameter.
      d20003| 2017-10-09T18:08:53.973-0400 I REPL     [rsBackgroundSync] transition to ROLLBACK
      d20003| 2017-10-09T18:08:53.973-0400 I NETWORK  [rsBackgroundSync] Skip closing connection for connection # 35
      d20003| 2017-10-09T18:08:53.973-0400 I NETWORK  [rsBackgroundSync] Skip closing connection for connection # 34
      d20003| 2017-10-09T18:08:53.974-0400 I NETWORK  [conn77] end connection 10.4.127.25:59759 (2 connections now open)
      d20003| 2017-10-09T18:08:53.974-0400 I ROLLBACK [rsBackgroundSync] Starting rollback. Sync source: robert-macbook-pro.local:20001
      b20001| 2017-10-09T18:08:53.975-0400 I NETWORK  [main] connection accepted from 10.4.127.25:59760 #70 (1 connection now open)
      d20004| 2017-10-09T18:08:53.975-0400 I NETWORK  [listener] connection accepted from 10.4.127.25:59761 #71 (8 connections now open)
      d20004| 2017-10-09T18:08:53.975-0400 I NETWORK  [conn71] received client metadata from 10.4.127.25:59761 conn: { driver: { name: "MongoDB Internal Client", version: "3.5.13-325-g155db50-patch-59d98345e3c331107a001198" }, os: { type: "Darwin", name: "Mac OS X", architecture: "x86_64", version: "17.0.0" } }
      d20003| 2017-10-09T18:08:53.976-0400 I ROLLBACK [rsBackgroundSync] Finding the Common Point
      d20003| 2017-10-09T18:08:53.976-0400 I ROLLBACK [rsBackgroundSync] our last optime:   Oct  9 18:05:07:c
      d20003| 2017-10-09T18:08:53.976-0400 I ROLLBACK [rsBackgroundSync] their last optime: Oct  9 18:05:19:34
      d20003| 2017-10-09T18:08:53.976-0400 I ROLLBACK [rsBackgroundSync] diff in end of log times: -12 seconds
      d20003| 2017-10-09T18:08:53.977-0400 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp 1507586706000|4, t: 1 }
      d20003| 2017-10-09T18:08:53.977-0400 I ROLLBACK [rsBackgroundSync] Starting refetching documents
      d20003| 2017-10-09T18:08:53.977-0400 D ROLLBACK [rsBackgroundSync] Refetching document, namespace: test.one, _id: _id: "ISKRjIN6"
      d20003| 2017-10-09T18:08:53.977-0400 I ROLLBACK [rsBackgroundSync] Finished refetching documents. Total size of documents refetched: 1
      d20003| 2017-10-09T18:08:53.977-0400 I ROLLBACK [rsBackgroundSync] Checking the RollbackID and updating the MinValid if necessary
      d20003| 2017-10-09T18:08:53.977-0400 I ROLLBACK [rsBackgroundSync] Setting minvalid to { ts: Timestamp 1507586719000|52, t: 2 }
      d20003| 2017-10-09T18:08:53.978-0400 I ROLLBACK [rsBackgroundSync] Dropping collections to roll back create operations
      d20003| 2017-10-09T18:08:53.978-0400 E -        [rsBackgroundSync] Assertion: 28539:need a valid database name src/mongo/db/concurrency/d_concurrency.cpp 198
      d20003| 2017-10-09T18:08:53.982-0400 W ROLLBACK [rsBackgroundSync] Rollback cannot complete at this time (retrying later): Location28539: need a valid database name appliedThrough= { ts: Timestamp 1507586707000|12, t: 1 } minvalid= { ts: Timestamp 1507586719000|52, t: 2 }
      

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            robert.guo@mongodb.com Robert Guo (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: