[SERVER-44881] Giant slow oplog entries are being logged in full Created: 29/Nov/19 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 4.2.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Michael Smith | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | former-quick-wins | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
From the release notes:
This seems reasonable, but it seems to be logging the entire contents of a batch-insert, which has adverse performance and privacy impacts. I get tons of logs like this: 2019-11-28T22:51:51.423+0000 I REPL [repl-writer-worker-2] warning: log line attempted (234kB) over max size (10kB), printing beginning and end ... applied op: CRUD { ts: [ Timestamp(1574981509, 1165), Timestamp(1574981509, 1184), Timestamp(1574981509, 1228), Timestamp(1574981509, 1235), Timestamp(1574981509, 1242), Timestamp(1574981509, 1285), Timestamp(1574981509, 1302), Timestamp(1574981509, 1318), Timestamp(1574981509, 1324), Timestamp(1574981509, 1341), Timestamp(1574981509, 1375), Timestamp(1574981509, 1376), Timestamp(1574981509, 1380), Timestamp(1574981510, 8), Timestamp(1574981510, 45), Timestamp(1574981510, 71), Timestamp(1574981510, 97), Timestamp(1574981510, 101), Timestamp(1574981510, 156), Timestamp(1574981510, 161), Timestamp(1574981510, 188), Timestamp(1574981510, 211), Timestamp(1574981510, 221), Timestamp(1574981510, 256), Timestamp(1574981510, 263), Timestamp(1574981510, 272), Timestamp(1574981510, 282), Timestamp(1574981510, 285), Timestamp(1574981510, 304), Timestamp(1574981510, 314), Timestamp(1574981510, 324), Timestamp(1574981510, 328), Timestamp(1574981510, 336), Timestamp(1574981510, 345), Timestamp(1574981510, 370), Timestamp(1574981510, 399), Timestamp(1574981510, 414), Timestamp(1574981510, 416), Timestamp(1574981510, 419), Timestamp(1574981510, 444), Timestamp(1574981510, 473), Timestamp(1574981510, 504), Timestamp(1574981510, 571), Timestamp(1574981510, 582), Timestamp(1574981510, 604), Timestamp(1574981510, 621), Timestamp(1574981510, 641), Timestamp(1574981510, 644), Timestamp(1574981510, 646), Timestamp(1574981510, 682), Timestamp(1574981510, 683), Timestamp(1574981510, 687), Timestamp(1574981510, 688), Timestamp(1574981510, 689), Timestamp(1574981510, 693), Timestamp(1574981510, 698), Timestamp(1574981510, 702), Timestamp(1574981510, 704), Timestamp(1574981510, 709), Timestamp(1574981510, 711), Timestamp(1574981510, 731), Timestamp(1574981510, 738), Timestamp(1574981510, 757), Timestamp(1574981510, 771) ], t: [ 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62 ], o: [ { _id: ObjectId('5de04f852f19ef00016eff32'), }}{{<redacted>ment_id: ObjectId('5dd5bc5d0305cb0001ca5de2'), acc .......... ", so<redacted>} ], h: 0, v: 2, op: "i", ns: "<redacted>", ui: UUID("e11ee941-efd8-4880-a6d5-f92e472ecb86"), wall: new Date(1574981509985) }, took 542ms Sure, I can and will bump up the slow operation threshold on the secondaries... but I only want to know about slow operations at the namespace/operation level; I'd never want the full inserted data to be logged. I mean, at most maybe log the `_id`, that's enough to find the data if you really need it. Note: The replica sets where I'm seeing this are still using featureCompatibilityVersion 4.0, if that makes a difference. |
| Comments |
| Comment by Asya Kamsky [ 19/Jan/20 ] |
|
Note that |
| Comment by Michael Smith [ 04/Dec/19 ] |
|
I had initially opened this because I saw slow replication performance on secondaries running 4.2.1 with a 4.0.4 master, and I thought the extra logging was the cause of it. I've worked around that by increasing the slow operation threshold, and I don't think it's really affected performance either way. |
| Comment by Carl Champain (Inactive) [ 03/Dec/19 ] |
|
Hi mzs, Thanks for the report. We understand that this behavior may cause trouble to you; I will pass this ticket along to the Replication team to let them know about it. |