[SERVER-55821] remove next_random_sample_size=1000 configuration in the oplog sampling code Created: 06/Apr/21  Updated: 29/Oct/23  Resolved: 06/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Keith Bostic (Inactive) Assignee: Benety Goh
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to WT-7373 Improve slow random cursor operations... Closed
is related to SERVER-43322 Add tracking tools for measuring Oplo... Closed
is related to SERVER-19551 Keep "milestones" against WT oplog to... Closed
is related to SERVER-21920 Use enhanced WiredTiger next_random c... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2021-06-14, Execution Team 2021-07-12
Participants:

 Description   

The use of the next_random_sample_size option when creating random cursors on a WiredTiger record store is not applicable anymore due to unbalanced trees being a non-issue in recent MongoDB releases and customer deployments. The work for this ticket involves removing the use of this option from the oplog sampling code in wiredtiger_record_store.cpp for 5.1 and to evaluate removing the support for this option in the WiredTiger storage engine at a later time.

PREVIOUS SUMMARY: Investigate slow random cursor operations on oplog

Random cursors can be quite slow on a multi-GB oplog table.

A customer has experienced slow startup times in 4.2 that they didn't see in 4.0.  Based on their logs mongod is spending a lot of time iterating a random cursor though the oplog — in one case it takes 15 minutes to perform 993 cursor->next() calls on a ~26GB oplog.  The oplog had only 62590 records, so the average record size is 100s of KB.

See WT-7373 for more discussion.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 05/Jul/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-55821 remove WiredTigerRecordStore::getRandomCursorWithOptions()
Branch: master
https://github.com/mongodb/mongo/commit/2fa358f0a35618daeded2686ef25e032f94c75cc

Comment by Githook User [ 04/Jul/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-55821 remove next_random_sample_size=1000 configuration in the oplog sampling code
Branch: master
https://github.com/mongodb/mongo/commit/671f32cfb27e08301e6564224338f6d035337e74

Comment by Githook User [ 03/Jul/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-55821 log minBytesPerStone when sampling oplog for placing stones
Branch: master
https://github.com/mongodb/mongo/commit/6ff9a52b5ab0bcc44eecd1dacf19d0b6bc30361e

Comment by Benety Goh [ 29/Jun/21 ]

We started logging the WiredTiger oplog processing time in SERVER-43322. These stats are also available in db.serverStatus() under the oplogTruncation section

Comment by Benety Goh [ 25/Jun/21 ]

(Reproducing some of the context from WT-7373) The next_random_sample_size option was added in SERVER-21920.

Comment by Keith Bostic (Inactive) [ 06/Apr/21 ]

This ticket is a place to investigate and potentially make changes to remove the next_random_sample_size=1000 configuration in the MongoDB server oplog sampling code, see WT-7373 for the discussion.

geert.bosch, I apologize for assigning this to you, but I wasn't sure of the right path and you've been flagged on some of the discussions, please don't hesitate to move this as you see fit.

 

Generated at Thu Feb 08 05:37:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.