[SERVER-31687] Missing minvalid in sync source cause oplog full table scan all the time Created: 24/Oct/17  Updated: 15/Nov/21  Resolved: 01/Oct/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.2.17, 3.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Zhang Youdong Assignee: Tess Avitabile (Inactive)
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File Missing minvalid in sync source cause oplog full table scan all the time.md    
Issue Links:
Backports
Duplicate
duplicates SERVER-29843 Make oplog queries with point equalit... Closed
Operating System: ALL
Backport Requested:
v4.0, v3.6, v3.4
Steps To Reproduce:

    OplogReader syncSourceReader;
    OpTime minValid;
    if (_replCoord->getMemberState().recovering()) {
        BatchBoundaries boundaries = getMinValid(txn);
        if (boundaries.end > lastOpTimeFetched) {
            minValid = boundaries.end;
        }
    }
    syncSourceReader.connectToSyncSource(txn, lastOpTimeFetched, minValid, _replCoord);

connectToSyncSource wil use db.oplog.rs.find( ts: {$gte: minValid, $lte: minvalid}) to check this oplog in sync source, which will trigger oplogStartHack.

When the minvalid optime exists in sync source, the oplogStartHack work very well; but if it doesn't exists in sync source, it will cause oplog full table scan.

I think when the minvaid doesn't exist in sync source, RecordStore::oplogStartHack should return RecordId::max() to avoid full oplog table scan, which is a very heavy operation.

diff --git a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
index f6fa6d6..94a1d48 100644
--- a/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
+++ b/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp
@@ -1792,7 +1792,7 @@ boost::optional<RecordId> WiredTigerRecordStore::oplogStartHack(
     if (ret == 0 && cmp > 0)
         ret = c->prev(c);  // landed one higher than startingPosition
     if (ret == WT_NOTFOUND)
-        return RecordId();  // nothing <= startingPosition
+        return RecordId::max();  // will not cause oplog full scan
     invariantWTOK(ret);
 
     int64_t key;

Sprint: Repl 2018-10-08
Participants:

 Description   

The mongod instance in my production environtment suffers very big IO pressure, and I see a lot of oplog scan in the log like belows.

2017-10-23T17:48:01.845+0800 I COMMAND  [conn8766752] query local.oplog.rs query: { ts: { $gte: Timestamp 1505624058000|95, $lte: Timestamp 1505624058000|95 } } planSummary: COLLSCAN cursorid:20808023597 ntoreturn:0 ntoskip:0 keysExamined:0 docsExamined:44669401 keyUpdates:0 writeConflicts:0 numYields:353599 nreturned:0 reslen:20 locks:{ Global: { acquireCount: { r: 707200 } }, Database: { acquireount: { r: 353600 }, acquireWaitCount: { r: 15 }, timeAcquiringMicros: { r: 3667 } }, oplog: { acquireCount: { r: 353600 } } } 935646ms

I found one of the secondary( a hidden node) is in RECORING state, I dive into the code and find the root cause.

The hidden node has following warning log

We cannot use xxx as a sync source because it does not contain the necessary operations for us to reach a consistent state ...

When secondary choose a sync source ,it will make sure the oplog in minvalid exist in the sync source,it will send a



 Comments   
Comment by Tess Avitabile (Inactive) [ 01/Oct/18 ]

This issue was fixed by SERVER-29843, so it is fixed in 3.6 and 4.0.

Comment by Zhang Youdong [ 12/May/18 ]

I have create a pull request in github to fix this issue, see https://github.com/mongodb/mongo/pull/1240

$lte condition may cause the oplog full scan, so we should just use $gte, and compare in the client side.

Comment by Zhang Youdong [ 10/Nov/17 ]

Is anyone tracking this issue?It's not updated for a long time.

Comment by Zhang Youdong [ 24/Oct/17 ]

Description + Steps To Reproduce = full Description, I don't know why it's splited, I just put all the text in Description part.

I attached a markdown file to make it readable.

Generated at Thu Feb 08 04:27:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.