[SERVER-78835] MongoDB truncates currentOP when it shouldn't Created: 10/Jul/23 Updated: 27/Oct/23 Resolved: 02/Oct/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.6, 5.0.18 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jean da Silva | Assignee: | Bernard Gorman |
| Resolution: | Works as Designed | Votes: | 3 |
| Labels: | query-director-triage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | I followed the same steps for 4.4.22, 5.0.18, and 6.0.6.
1. Deploy a standard sharded cluster:
2. Enabled sharding on the following database and collection:
3. Started to insert dummy data with mgeneratejs as follows:
Observation:
Again, this only happens on 5.0 and 6.0 Following the same steps on 4.4 did not rise such messages.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: |
| Description |
|
While running a batch insert on a Sharded Cluster environment, we noticed a shard getting flooded with the following messages:
However, while analyzing the problem, we saw that since 5.0, mongo changed how currentOP would work, as documented here and{} here.
Alongside that, the message itself is not very clear on the limits; Is that on bytes or kilobytes? I'm assuming it's reporting in bytes, following the reference here from the source code.
If that's correct.
I've tested that on 4.4.22, 5.0.18, and 6.0.6. But got those messages only on 5.0.18 and 6.0.6. |
| Comments |
| Comment by Bernard Gorman [ 02/Oct/23 ] |
|
Hi jean_nsilva@hotmail.com - I hope I've addressed all your questions above, in particular how using the $currentOp aggregation stage directly can avoid these truncation issues. Since the currentOp command is intended to truncate operations - even when internally implemented as an aggregation - and the log messages are only expected on 5.0 and later, I'll close this ticket as Works As Designed. |
| Comment by Bernard Gorman [ 08/Sep/23 ] |
|
Thank you for raising this issue. I'll try to address your questions below.
The currentOp command reports all events on the cluster in a single BSON document; the sum total of all operations in the system must therefore be less than 16MB, since that is the maximum permitted BSON document size. To reduce the chances of hitting this limit, each individual operation is limited to 1000 bytes.
When we added the $currentOp aggregation stage, we retained the currentOp command for backward compatibility. This means that it must still report all events on the system in a single BSON document. If a client issues a currentOp command, we will convert it into a $currentOp aggregation internally, but we configure the aggregation to perform the same truncation as the original currentOp command, and at the end of the aggregation we combine all results into a single document. If you want to avoid these truncation issues, the best way is to run a $currentOp aggregation directly rather than using the currentOp command.
The log message in question did not exist in 4.4 (compare to 5.0, 6.0, master); it was added in |
| Comment by Chris Kelly [ 18/Jul/23 ] |
|
Thank you for your report, and your attention to detail in presenting your questions! I am going to pass this ticket to the relevant team to look into these questions. |