[SERVER-31687] Missing minvalid in sync source cause oplog full table scan all the time Created: 24/Oct/17 Updated: 15/Nov/21 Resolved: 01/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.17, 3.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Zhang Youdong | Assignee: | Tess Avitabile (Inactive) |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6, v3.4
|
||||||||||||||||||||||
| Steps To Reproduce: |
connectToSyncSource wil use db.oplog.rs.find( ts: {$gte: minValid, $lte: minvalid}) to check this oplog in sync source, which will trigger oplogStartHack. When the minvalid optime exists in sync source, the oplogStartHack work very well; but if it doesn't exists in sync source, it will cause oplog full table scan. I think when the minvaid doesn't exist in sync source, RecordStore::oplogStartHack should return RecordId::max() to avoid full oplog table scan, which is a very heavy operation.
|
||||||||||||||||||||||
| Sprint: | Repl 2018-10-08 | ||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||
| Description |
|
The mongod instance in my production environtment suffers very big IO pressure, and I see a lot of oplog scan in the log like belows.
I found one of the secondary( a hidden node) is in RECORING state, I dive into the code and find the root cause. The hidden node has following warning log
When secondary choose a sync source ,it will make sure the oplog in minvalid exist in the sync source,it will send a |
| Comments |
| Comment by Tess Avitabile (Inactive) [ 01/Oct/18 ] |
|
This issue was fixed by |
| Comment by Zhang Youdong [ 12/May/18 ] |
|
I have create a pull request in github to fix this issue, see https://github.com/mongodb/mongo/pull/1240 $lte condition may cause the oplog full scan, so we should just use $gte, and compare in the client side. |
| Comment by Zhang Youdong [ 10/Nov/17 ] |
|
Is anyone tracking this issue?It's not updated for a long time. |
| Comment by Zhang Youdong [ 24/Oct/17 ] |
|
Description + Steps To Reproduce = full Description, I don't know why it's splited, I just put all the text in Description part. I attached a markdown file to make it readable. |