[SERVER-43069] Ops that are slow to apply don't condition on slowOpSampleRate Created: 28/Aug/19  Updated: 29/Oct/23  Resolved: 07/Feb/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.14, 4.0.12, 4.2.0
Fix Version/s: 4.3.4

Type: Bug Priority: Major - P3
Reporter: David Bartley Assignee: Xuerui Fa
Resolution: Fixed Votes: 0
Labels: former-quick-wins, neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-46030 Logging should accommodate a 0 ms slo... Closed
Documented
is documented by DOCS-13395 Investigate changes in SERVER-43069: ... Closed
Related
is related to SERVER-44881 Giant slow oplog entries are being lo... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2020-02-10
Participants:
Case:
Linked BF Score: 39

 Description   

Here's the code in question: https://github.com/mongodb/mongo/blob/r3.6.14/src/mongo/db/repl/sync_tail.cpp#L335-L351

That should be doing something similar to https://github.com/mongodb/mongo/blob/r3.6.14/src/mongo/db/ops/write_ops_exec.cpp#L122-L124

This behaviour is called out in https://docs.mongodb.com/manual/release-notes/4.2/#slow-oplog, though I don't see any discussion on SERVER-32146 or DOCS-12178 for whether sampling was considered.

There's also a couple other places where this might need to be checked:

It's possible a better fix would be to decouple slowOpSampleRate and slowms, such that you log all slow operations but then subject non-slow ops to sampling.



 Comments   
Comment by Githook User [ 07/Feb/20 ]

Author:

{'username': 'XueruiFa', 'name': 'Xuerui Fa', 'email': 'xuerui.fa@mongodb.com'}

Message: SERVER-43069: Condition logging for slow ops on sample rate

create mode 100644 src/mongo/util/log_with_sampling.h
create mode 100644 src/mongo/util/log_with_sampling_test.cpp
Branch: master
https://github.com/mongodb/mongo/commit/fd1e712a1407181e1105b4767205a42bbf463bbd

Comment by Githook User [ 07/Feb/20 ]

Author:

{'name': 'Xuerui Fa', 'username': 'XueruiFa', 'email': 'xuerui.fa@mongodb.com'}

Message: Revert "SERVER-43069: Condition logging for slow ops on sample rate"

This reverts commit b0c5c0baa85fba563c80ee416cecc22e9ffbf53a.

delete mode 100644 src/mongo/util/log_with_sampling.h
delete mode 100644 src/mongo/util/log_with_sampling_test.cpp
Branch: master
https://github.com/mongodb/mongo/commit/6e049cf9804e1d7147f5dcd6ac11b68f480d088a

Comment by Githook User [ 05/Feb/20 ]

Author:

{'username': 'XueruiFa', 'name': 'Xuerui Fa', 'email': 'xuerui.fa@mongodb.com'}

Message: SERVER-43069: Condition logging for slow ops on sample rate

create mode 100644 src/mongo/util/log_with_sampling.h
create mode 100644 src/mongo/util/log_with_sampling_test.cpp
Branch: master
https://github.com/mongodb/mongo/commit/b0c5c0baa85fba563c80ee416cecc22e9ffbf53a

Comment by Judah Schvimer [ 13/Sep/19 ]

Thank you and glad this is no longer causing you pain!

Comment by David Bartley [ 12/Sep/19 ]

It's not causing us any pain currently because we ended up patching this to honour sample rate. Without the patch, the main issue is that we would have greatly increased our log volume and associated logging system costs.

Comment by Judah Schvimer [ 12/Sep/19 ]

Hi bartle,

Thank you for bringing this issue to our attention. This is definitely a bug and we would like to fix it. To aid in prioritization, can you please help us understand how much pain this is causing you?

Thank you,
Judah

Comment by Danny Hatcher (Inactive) [ 29/Aug/19 ]

Thanks for the report. We'll take a look to determine the best path forward.

Generated at Thu Feb 08 05:02:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.