Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17081

Resync command under write load can fassert

    • Type: Icon: Bug Bug
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • None
    • Replication
    • ALL

      m31001| 2015-01-27T14:38:12.778-0500 I REPL     [rsSync] initial sync drop all databases
       m31001| 2015-01-27T14:38:12.778-0500 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 2
       m31001| 2015-01-27T14:38:12.786-0500 I JOURNAL  [rsSync] journalCleanup...
       m31001| 2015-01-27T14:38:12.786-0500 I JOURNAL  [rsSync] removeJournalFiles
       m31001| 2015-01-27T14:38:12.788-0500 I JOURNAL  [rsSync] journalCleanup...
       m31001| 2015-01-27T14:38:12.788-0500 I JOURNAL  [rsSync] removeJournalFiles
       m31001| 2015-01-27T14:38:12.790-0500 I REPL     [rsSync] initial sync clone all databases
       m31001| 2015-01-27T14:38:12.790-0500 I REPL     [rsSync] initial sync cloning db: test
       m31000| 2015-01-27T14:38:12.792-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.104:60845 #21 (6 connections now open)
       m31000| 2015-01-27T14:38:12.792-0500 I NETWORK  [conn21] end connection 192.168.1.104:60845 (5 connections now open)
       m31000| 2015-01-27T14:38:12.793-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.104:60846 #22 (6 connections now open)
       m31001| 2015-01-27T14:38:12.799-0500 I INDEX    [rsSync] allocating new ns file /data/db/testReplSet-1/test.ns, filling with zeroes...
       m31001| 2015-01-27T14:38:12.846-0500 I STORAGE  [FileAllocator] allocating new datafile /data/db/testReplSet-1/test.0, filling with zeroes...
       m31001| 2015-01-27T14:38:12.916-0500 I STORAGE  [FileAllocator] done allocating datafile /data/db/testReplSet-1/test.0, size: 16MB,  took 0.069 secs
       m31001| 2015-01-27T14:38:12.971-0500 I INDEX    [rsSync] build index on: test.a properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.a" }
       m31001| 2015-01-27T14:38:12.971-0500 I INDEX    [rsSync] 	 building index using bulk method
       m31001| 2015-01-27T14:38:12.982-0500 I INDEX    [rsSync] build index done.  scanned 5479 total records. 0 secs
       m31001| 2015-01-27T14:38:12.982-0500 I REPL     [rsSync] initial sync data copy, starting syncup
       m31001| 2015-01-27T14:38:12.982-0500 I REPL     [rsSync] oplog sync 1 of 3
       m31001| 2015-01-27T14:38:12.983-0500 F REPL     [rsSync] replication oplog stream went back in time. previous timestamp: 54c7e924:1a6 newest timestamp: 54c7e924:1a4. Op being applied: { ts: Timestamp 1422387492000|420, h: 3326148221373857081, v: 2, op: "i", ns: "test.a", o: { _id: ObjectId('54c7e9249ee01fcbe529f117'), x: 5391.0 } }
       m31001| 2015-01-27T14:38:12.983-0500 I -        [rsSync] Fatal Assertion 18905
       m31001| 2015-01-27T14:38:12.983-0500 I -        [rsSync] 
       m31001| 
       m31001| ***aborting after fassert() failure
      

      The issue below is that lastapplied is greater than minvalid, so setting minvalid on initial sync failure addresses this:

      m31001| 2015-01-27T14:17:05.790-0500 I REPL     [rsSync] initial sync pending
       m31001| 2015-01-27T14:17:05.790-0500 I REPL     [ReplicationExecutor] syncing from: pans.local:31000
       m31000| 2015-01-27T14:17:05.790-0500 I NETWORK  [conn16] end connection 192.168.1.104:57695 (3 connections now open)
       m31000| 2015-01-27T14:17:05.793-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.104:57717 #17 (4 connections now open)
       m31000| 2015-01-27T14:17:05.796-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.104:57718 #18 (5 connections now open)
       m31001| 2015-01-27T14:17:05.797-0500 I REPL     [rsSync] initial sync drop all databases
       m31001| 2015-01-27T14:17:05.797-0500 I STORAGE  [rsSync] dropAllDatabasesExceptLocal 2
       m31001| 2015-01-27T14:17:05.803-0500 I REPL     [rsBackgroundSync] replSet our last op time fetched: Dec 31 19:00:00:0
       m31001| 2015-01-27T14:17:05.803-0500 I REPL     [rsBackgroundSync] replset source's GTE: Jan 27 14:16:11:1
       m31001| 2015-01-27T14:17:05.808-0500 I JOURNAL  [rsSync] journalCleanup...
       m31001| 2015-01-27T14:17:05.808-0500 I JOURNAL  [rsSync] removeJournalFiles
       m31001| 2015-01-27T14:17:05.811-0500 I JOURNAL  [rsSync] journalCleanup...
       m31001| 2015-01-27T14:17:05.811-0500 I JOURNAL  [rsSync] removeJournalFiles
       m31001| 2015-01-27T14:17:05.813-0500 I REPL     [rsSync] initial sync clone all databases
       m31001| 2015-01-27T14:17:05.813-0500 F REPL     [rsBackgroundSync] replSet need to rollback, but in inconsistent state
       m31001| 2015-01-27T14:17:05.813-0500 I REPL     [rsBackgroundSync] minvalid: 54c7e430:1de our last optime: 0:0
       m31001| 2015-01-27T14:17:05.813-0500 I REPL     [rsSync] initial sync cloning db: test
       m31001| 2015-01-27T14:17:05.813-0500 I -        [rsBackgroundSync] Fatal Assertion 18750
       m31001| 2015-01-27T14:17:05.813-0500 I -        [rsBackgroundSync] 
       m31001| 
       m31001| ***aborting after fassert() failure
      

            Assignee:
            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            Reporter:
            scotthernandez Scott Hernandez (Inactive)
            Votes:
            3 Vote for this issue
            Watchers:
            14 Start watching this issue

              Created:
              Updated:
              Resolved: