[SERVER-36897] OplogReader.hasNext can return false -> true, confusing `checkOplogs` Created: 27/Aug/18  Updated: 29/Oct/23  Resolved: 19/Sep/18

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: 4.0.1, 4.1.3
Fix Version/s: 4.0.3, 4.1.4

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: David Bradford (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Steps To Reproduce:

replsets/too_stale_secondary.js fails with the following patch:

diff --git a/src/mongo/db/exec/collection_scan.cpp b/src/mongo/db/exec/collection_scan.cpp
index 162caf4cf3..7dab48aaa6 100644
--- a/src/mongo/db/exec/collection_scan.cpp
+++ b/src/mongo/db/exec/collection_scan.cpp
@@ -48,6 +48,8 @@
 
 namespace mongo {
 
+MONGO_FAIL_POINT_DEFINE(cappedPositionLost);
+
 using std::unique_ptr;
 using std::vector;
 using stdx::make_unique;
@@ -78,6 +80,17 @@ CollectionScan::CollectionScan(OperationContext* opCtx,
 }
 
 PlanStage::StageState CollectionScan::doWork(WorkingSetID* out) {
+    if (MONGO_FAIL_POINT(cappedPositionLost)) {
+        _isDead = true;
+        Status status(ErrorCodes::CappedPositionLost,
+                      str::stream() << "CollectionScan died due to failure to restore "
+                                    << "tailable cursor position. "
+                                    << "Last seen record id: "
+                                    << _lastSeenId);
+        *out = WorkingSetCommon::allocateStatusMember(_workingSet, status);
+        return PlanStage::DEAD;
+    }
+
     if (_isDead) {
         Status status(
             ErrorCodes::CappedPositionLost,
diff --git a/src/mongo/shell/replsettest.js b/src/mongo/shell/replsettest.js
index 97bacd8cdd..0cafca2ad0 100644
--- a/src/mongo/shell/replsettest.js
+++ b/src/mongo/shell/replsettest.js
@@ -2009,7 +2009,7 @@ var ReplSetTest = function(opts) {
      */
     function checkOplogs(rst, slaves, msgPrefix = 'checkOplogs') {
         slaves = slaves || rst._slaves;
-        var OplogReader = function(mongo) {
+        var OplogReader = function(mongo, failOnFirst) {
             this.next = function() {
                 if (!this.cursor)
                     throw new Error("OplogReader is not open!");
@@ -2024,6 +2024,13 @@ var ReplSetTest = function(opts) {
                 if (!this.cursor)
                     throw new Error("OplogReader is not open!");
                 try {
+                    if (this.failOnFirst && this.failed === false) {
+                        this.failed = true;
+                        assert.commandWorked(this.mongo.adminCommand(
+                            {configureFailPoint: "cappedPositionLost", mode: {"times": 1}}));
+                        printjson("Called cappedPositionLost.");
+                    }
+
                     return this.cursor.hasNext();
                 } catch (err) {
                     print("Error: hasNext threw '" + err.message + "' on " + this.mongo.host);
@@ -2059,6 +2066,8 @@ var ReplSetTest = function(opts) {
             this.lastDoc = null;
             this.cursor = null;
             this.mongo = mongo;
+            this.failOnFirst = failOnFirst;
+            this.failed = false;
         };
 
         if (slaves.length >= 1) {
@@ -2080,7 +2089,8 @@ var ReplSetTest = function(opts) {
                     continue;
                 }
 
-                readers[i] = new OplogReader(node);
+                print("Creating OplogReader. I==2? " + (i == 2));
+                readers[i] = new OplogReader(node, i == 2);
                 var currTS = readers[i].getFirstDoc().ts;
                 // Find the reader which has the smallestTS. This reader should have the most
                 // number of documents in the oplog.
@@ -2108,7 +2118,7 @@ var ReplSetTest = function(opts) {
                     var otherOplogEntry = readers[i].next();
                     if (!bsonBinaryEqual(oplogEntry, otherOplogEntry)) {
                         var query = prevOplogEntry ? {ts: {$lte: prevOplogEntry.ts}} : {};
-                        rst.nodes.forEach(node => this.dumpOplog(node, query, 100));
+                        // rst.nodes.forEach(node => this.dumpOplog(node, query, 100));
                         var log = msgPrefix +
                             ", non-matching oplog entries for the following nodes: \n" +
                             firstReader.mongo.host + ": " + tojsononeline(oplogEntry) + "\n" +

Sprint: TIG 2018-09-24
Participants:
Linked BF Score: 18
Story Points: 2

 Description   

`ReplSetTest.checkOplog` will establish a reverse table scanning cursor on the oplog from each node, then "BFS" to compare the oplog for consistency. It will first get the latest entry in each oplog and perform a comparison, then advance all of the cursors and compare again. It allows the oplogs to have a different number of entries, so long as they match on the latest entries.

To do so, checkOplog relies on OplogReader.hasNext to always return false after it has done so for the first time. However, if the call that executes the query receives a CappedPositionLost (thus not instantiating the shell's internal _cursor, a follow-up OplogReader.hasNext can return true, having re-issued the find and receiving a batch from the oplog starting at the latest entry.



 Comments   
Comment by Githook User [ 19/Sep/18 ]

Author:

{'name': 'David Bradford', 'email': 'david.bradford@mongodb.com', 'username': 'dbradf'}

Message: SERVER-37042 SERVER-36897: Handle errors from cursor.next in checkOplogs

(cherry picked from commit 86d7a754bd72f055445968f4de92c60ae7b8ae70)
Branch: v4.0
https://github.com/mongodb/mongo/commit/e248c85a590b7e8eaf7c6150ddbe240cd595e6f2

Comment by Githook User [ 19/Sep/18 ]

Author:

{'name': 'David Bradford', 'email': 'david.bradford@mongodb.com', 'username': 'dbradf'}

Message: SERVER-37042 SERVER-36897: Handle errors from cursor.next in checkOplogs
Branch: master
https://github.com/mongodb/mongo/commit/86d7a754bd72f055445968f4de92c60ae7b8ae70

Comment by David Bradford (Inactive) [ 14/Sep/18 ]

I was already working in area in SERVER-37042, so I am going to roll this fix in with it.

Generated at Thu Feb 08 04:44:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.