[SERVER-17114] WiredTiger capped collection cursor needs to go DEAD when restoreState to a deleted doc Created: 29/Jan/15  Updated: 18/Sep/15  Resolved: 03/Feb/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.0-rc7
Fix Version/s: 3.0.0-rc8

Type: Bug Priority: Critical - P2
Reporter: Michael Grundy Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: 28qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File shard2-02.log.gz     File shard2-03.log.gz    
Issue Links:
Related
related to SERVER-17181 Correctly invalidate RocksDB cursor w... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

3 member replica set, c3.2xlarge instances, 8 cpu, 15g

Socialite load workload: java -jar target/socialite-0.0.1-SNAPSHOT.jar load --users 10000000 --maxfollows 1000 --messages 2000 --threads 32 sample-config.yml

Participants:

 Description   

Running a heavy insert workload on a three node replica set, the secondaries fell behind. One quit with Fatal Assertion 18750, the other stayed in RECOVERING.

It seems that after going into RECOVERING the secondary tried syncing from the other secondary, which had also gone into RECOVERING (EDIT: the secondary that asserted went into recovery before the other secondary) and then asserted:

2015-01-29T11:30:54.612+0000 I REPL     [ReplicationExecutor] syncing from: shard2-01.testdomain.com:27017
2015-01-29T11:30:54.615+0000 W REPL     [rsBackgroundSync] we are too stale to use shard2-01.testdomain.com:27017 as a sync source
2015-01-29T11:30:54.615+0000 I REPL     [ReplicationExecutor] could not find member to sync from 
2015-01-29T11:30:54.615+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-01-29T11:30:54.617+0000 I REPL     [rsBackgroundSync] replSet our last optime : Jan 29 09:56:58 54ca03ea:1c1f
2015-01-29T11:30:54.617+0000 I REPL     [rsBackgroundSync] replSet oldest available is Jan 29 10:03:57 54ca058d:1803
2015-01-29T11:30:54.617+0000 I REPL     [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2015-01-29T11:30:54.737+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
2015-01-29T11:30:54.925+0000 I NETWORK  [conn3286] end connection 10.93.30.140:59321 (5 connections now open)
2015-01-29T11:30:54.932+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.140:59323 #3288 (6 connections now open)
2015-01-29T11:31:17.471+0000 I QUERY    [conn1] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}
2015-01-29T11:31:24.407+0000 I NETWORK  [conn3287] end connection 10.93.30.139:60915 (5 connections now open)
2015-01-29T11:31:24.408+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.139:60917 #3289 (6 connections now open)
2015-01-29T11:31:24.942+0000 I NETWORK  [conn3288] end connection 10.93.30.140:59323 (5 connections now open)
2015-01-29T11:31:24.943+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.140:59325 #3290 (6 connections now open)
2015-01-29T11:31:35.744+0000 I REPL     [ReplicationExecutor] syncing from: shard2-03.testdomain.com:27017
2015-01-29T11:31:36.077+0000 I REPL     [SyncSourceFeedback] replset setting syncSourceFeedback to shard2-03.testdomain.com:27017
2015-01-29T11:31:36.360+0000 I REPL     [rsBackgroundSync] replSet our last op time fetched: Jan 29 09:56:58:1c1f
2015-01-29T11:31:36.360+0000 I REPL     [rsBackgroundSync] replset source's GTE: Jan 29 09:57:02:5a0
2015-01-29T11:31:36.361+0000 F REPL     [rsBackgroundSync] replSet need to rollback, but in inconsistent state
2015-01-29T11:31:36.361+0000 I REPL     [rsBackgroundSync] minvalid: 54ca058d:1803 our last optime: 54ca03ea:1c1f
2015-01-29T11:31:36.362+0000 I -        [rsBackgroundSync] Fatal Assertion 18750
2015-01-29T11:31:36.362+0000 I -        [rsBackgroundSync] 



 Comments   
Comment by Githook User [ 04/Feb/15 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-17114 capped cursor should die when falling off the back of collection

(cherry picked from commit 0e523eec39056da5b0dfb3826bbe27b4f376deb6)
Branch: v3.0
https://github.com/mongodb/mongo/commit/97591316b10a0c41c8449d15d23dc562401998b2

Comment by Githook User [ 03/Feb/15 ]

Author:

{u'username': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-17114 capped cursor should die when falling off the back of collection
Branch: master
https://github.com/mongodb/mongo/commit/0e523eec39056da5b0dfb3826bbe27b4f376deb6

Comment by Eric Milkie [ 02/Feb/15 ]

The fassert appears to be due to holes in the oplog. That is, a range of documents is missing from both secondaries. The ranges of missing documents are different between the two secondaries, but the ranges do somewhat overlap, and correspond to a time when the primary was slow (as determined by slow-query logging on the primary). The running theory is that the slowness triggered something that caused the secondaries' getMore operations on their tailable cursors to miss oplog documents.

Comment by Michael Grundy [ 29/Jan/15 ]

shard2-02 went into recovery first according to the logs. Updating desc

Comment by Michael Grundy [ 29/Jan/15 ]

This is the log from shard2-03 when the shard2-02 (10.93.30.138) secondary went into recovery:

2015-01-29T11:30:56.398+0000 I REPL     [ReplicationExecutor] Member shard2-02.testdomain.com:27017 is now in state RECOVERING
2015-01-29T11:30:57.313+0000 I NETWORK  [conn3286] end connection 10.93.30.140:52720 (5 connections now open)
2015-01-29T11:30:57.321+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.140:52722 #3288 (6 connections now open)
2015-01-29T11:31:17.518+0000 I COMMAND  [conn3] command admin.$cmd command: serverStatus { serverStatus: 1, oplog: 1 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:16435 112ms
2015-01-29T11:31:18.008+0000 I QUERY    [conn3] query local.oplog.rs query: { $query: {}, $orderby: { $natural: 1 } } planSummary: COLLSCAN ntoskip:0 nscanned:0 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:37 164ms
2015-01-29T11:31:19.366+0000 I NETWORK  [conn3287] end connection 10.93.30.138:52649 (5 connections now open)
2015-01-29T11:31:19.366+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.138:52652 #3289 (6 connections now open)
2015-01-29T11:31:27.337+0000 I NETWORK  [conn3288] end connection 10.93.30.140:52722 (5 connections now open)
2015-01-29T11:31:27.338+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.140:52724 #3290 (6 connections now open)
2015-01-29T11:31:35.746+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.138:52654 #3291 (7 connections now open)
2015-01-29T11:31:36.077+0000 I QUERY    [conn3291] query local.oplog.rs planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:0 nscannedObjects:1 keyUpdates:0 writeConflicts:0 numYields:1 nreturned:1 reslen:147 218ms
2015-01-29T11:31:36.079+0000 I NETWORK  [initandlisten] connection accepted from 10.93.30.138:52655 #3292 (8 connections now open)
2015-01-29T11:31:37.034+0000 I NETWORK  [ReplExecNetThread-24] Socket recv() errno:104 Connection reset by peer 10.93.30.138:27017
2015-01-29T11:31:37.034+0000 I NETWORK  [conn3289] end connection 10.93.30.138:52652 (7 connections now open)
2015-01-29T11:31:37.034+0000 I NETWORK  [ReplExecNetThread-24] SocketException: remote: 10.93.30.138:27017 error: 9001 socket exception [RECV_ERROR] server [10.93.30.138:27017] 
2015-01-29T11:31:37.034+0000 I NETWORK  [conn3291] end connection 10.93.30.138:52654 (7 connections now open)
2015-01-29T11:31:37.035+0000 I NETWORK  [conn3292] end connection 10.93.30.138:52655 (5 connections now open)
2015-01-29T11:31:37.035+0000 I NETWORK  [ReplExecNetThread-24] DBClientCursor::init call() failed
2015-01-29T11:31:37.055+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to shard2-02.testdomain.com:27017; Location10276 DBClientBase::findN: transport error: shard2-02.testdomain.com:27017 ns: admin.$cmd query: { replSetHeartbeat: "shard2", pv: 1, v: 1, from: "shard2-03.testdomain.com:27017", fromId: 3, checkEmpty: false }
2015-01-29T11:31:37.057+0000 W NETWORK  [ReplExecNetThread-24] Failed to connect to 10.93.30.138:27017, reason: errno:111 Connection refused
2015-01-29T11:31:37.058+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to shard2-02.testdomain.com:27017; Location18915 Failed attempt to connect to shard2-02.testdomain.com:27017; couldn't connect to server shard2-02.testdomain.com:27017 (10.93.30.138), connection attempt failed

And here's where shard2-03 went into recovery:

2015-01-29T11:33:49.414+0000 W NETWORK  [ReplExecNetThread-26] Failed to connect to 10.93.30.138:27017, reason: errno:111 Connection refused
2015-01-29T11:33:49.414+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to shard2-02.testdomain.com:27017; Location18915 Failed attempt to connect to shard2-02.testdomain.com:27017; couldn't connect to server shard2-02.testdomain.com:27017 (10.93.30.138), connection attempt failed
2015-01-29T11:33:50.780+0000 I REPL     [ReplicationExecutor] syncing from: shard2-01.testdomain.com:27017
2015-01-29T11:33:50.781+0000 W REPL     [rsBackgroundSync] we are too stale to use shard2-01.testdomain.com:27017 as a sync source
2015-01-29T11:33:50.782+0000 I REPL     [ReplicationExecutor] could not find member to sync from
2015-01-29T11:33:50.782+0000 I REPL     [rsBackgroundSync] replSet error RS102 too stale to catch up
2015-01-29T11:33:50.782+0000 I REPL     [rsBackgroundSync] replSet our last optime : Jan 29 10:00:00 54ca04a0:19
2015-01-29T11:33:50.782+0000 I REPL     [rsBackgroundSync] replSet oldest available is Jan 29 10:07:23 54ca065b:3564
2015-01-29T11:33:50.783+0000 I REPL     [rsBackgroundSync] replSet See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2015-01-29T11:33:50.912+0000 I REPL     [ReplicationExecutor] transition to RECOVERING

Will attach complete logs

Comment by Eric Milkie [ 29/Jan/15 ]

michael.grundy@10gen.com can you attach the log from the other secondary in this set?

Comment by Eric Milkie [ 29/Jan/15 ]

What I think is happening here is that the secondary that we are choosing as a sync source happens to delete the op we are looking for from its capped oplog collection right before we query for it. The code currently isn't set up to handle this situation; we could handle it by, instead of rolling back, running another query looking for the first op in the sync source oplog and if it's ahead of our last op, transition to (or remain in) RS102 RECOVERING and do not attempt rollback.

Generated at Thu Feb 08 03:43:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.