[SERVER-44867] Aggregate on Secondary fails with CursorKilled Created: 27/Nov/19 Updated: 19/Feb/20 Resolved: 14/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework |
| Affects Version/s: | 4.0.2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Frank Shimizu | Assignee: | Carl Champain (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Steps To Reproduce: |
|
| Participants: |
| Description |
|
I tried searching for issues and entries in the release notes but couldn't find anything that seems to address our issue, apologies if this is a duplicate.
MongoDB version: 4.0.2 Setup: Replica Set: 1 Primary, 1 Secondary, 1 Arbiter; no Sharding OS: CentOS 7.7.1908 CPUs: 6 (Primary & Secondary) RAM: 31GB (Primary & Secondary) Data size: ~100GB
Aggregations on a collection with ~52 million documents sometimes fail with errors like this (DB, collection names and pipeline redacted):
In this example log it happened on a COLLSCAN, but it also happens with IXSCAN just the same. It seems to happen much more often when the affected data is not well cached in memory, such as after a server reboot. We tried to reproduce this on the Primary, but so far we can only reproduce it on the Secondary.
One generic aggregation which produces this for us is this (had to redact all the field names):
The only other reference to this that we could find is this post:
Please let me know if I can provide any other info. |
| Comments |
| Comment by Carl Champain (Inactive) [ 19/Feb/20 ] |
|
Thanks for the follow-up, and we're glad to hear that the upgrade fixed your issue! |
| Comment by Frank Shimizu [ 19/Feb/20 ] |
|
Just a quick follow-up. We upgraded our cluster from 4.0.2 to 4.0.16. So far I'm unable to reproduce the issue, it looks like it's fixed with the upgrade.
Again many thanks for all the help. |
| Comment by Carl Champain (Inactive) [ 14/Jan/20 ] |
|
frank.shimizu@est.fujitsu.com, Let's close this ticket as incomplete, and let us know with a comment here wether or not the upgrade worked. If the upgrade doesn't fix your issue, we will reopen this ticket. Thank you, |
| Comment by Frank Shimizu [ 14/Jan/20 ] |
|
Hi Carl, thanks a lot for your analysis. I will bring the upgrade up with the team and I'm fairly confident that we can do the upgrade, but it may take some time to find a maintenance window. Until the upgrade is completed and we can re-test, should we close this issue and - if necessary - create a new one later, or will we keep this one open? Regards Frank |
| Comment by Carl Champain (Inactive) [ 13/Jan/20 ] |
|
Hi frank.shimizu@est.fujitsu.com, Thank you for the additional details. Let us know how it goes, |
| Comment by Frank Shimizu [ 08/Jan/20 ] |
|
Thanks for your patience. I set both verbosity levels to 3 and reproduced the effect. The relevant portion of the log file is uploaded to the secure portal - SHA256SUM is: 759ee3329dba602d87820fc70b4c8fa0a0d55b8e8c60899ba596a04d7e5e150c mongod-verbose.log.gz
The aggregation starts at 2020-01-08T11:44:23.124+0000 and the error can be seen at 2020-01-08T11:47:47.509+0000.
Thanks for all your efforts! |
| Comment by Frank Shimizu [ 07/Jan/20 ] |
|
Hi Carl, again sorry for the long delay, I was on vacation over the holidays. I can provide the requested log, please allow me some time because of other tasks. |
| Comment by Carl Champain (Inactive) [ 20/Dec/19 ] |
|
Hi frank.shimizu@est.fujitsu.com, We'd like to collect more information; could you please increase the log verbosity for query and command by running the following mongo shell commands:
Then reproduce the issue, share the mongod.log file (covering the verbose-logged reproduction only) with us in the secure upload portal, and go back to the logging verbosity you were using. Thanks! |
| Comment by Frank Shimizu [ 17/Dec/19 ] |
|
Sorry for the long delay. The compressed log file is now uploaded. Please be aware that is uncompresses to ~4GB. The SHA256 is: 517E36CDE3FADD8889183E2310BE9B2B9425C647FFA8115D3D9512F707D8D69E mongod.log.redacted.gz |
| Comment by Carl Champain (Inactive) [ 12/Dec/19 ] |
|
frank.shimizu@est.fujitsu.com, That's totally acceptable. Thank you! |
| Comment by Frank Shimizu [ 12/Dec/19 ] |
|
I got approval to upload the complete log file. But it would be necessary to replace all contained personal data, such as email addresses and login names, with anonymized versions before uploading. Would that be acceptable? |
| Comment by Frank Shimizu [ 10/Dec/19 ] |
|
I understand. Please allow me some time to get this checked and approved internally. |
| Comment by Carl Champain (Inactive) [ 10/Dec/19 ] |
|
frank.shimizu@est.fujitsu.com, Thanks for providing more details! |
| Comment by Frank Shimizu [ 09/Dec/19 ] |
|
Amazing, thanks! I've uploaded the diagnostic data. SHA256 checksum is: I've also uploaded a short excerpt of the server log of the time when the error was reproduced (mongo-error-log.txt). Since the full log file contains queries and reveals all kinds of data, I would have to get uploading this checked and approved first. Please let me know if this excerpt is enough or if the full log file is needed. |
| Comment by Carl Champain (Inactive) [ 09/Dec/19 ] |
|
frank.shimizu@est.fujitsu.com, Sure! I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thank you, |
| Comment by Frank Shimizu [ 09/Dec/19 ] |
|
Hi @carl.champain Thanks for your reply and sorry for my late response, I was unavailable last week. I have reproduced the error and collected the log file (265MB compressed) as well as diagnostic data (191MB compressed) shortly after. We consider both to be sensitive/protected data since this is a production system with customer data. Is it possible to upload these in a non-public way? Regards |
| Comment by Carl Champain (Inactive) [ 02/Dec/19 ] |
|
Hi frank.shimizu@est.fujitsu.com, Thanks for the report.
Kind regards, |
| Comment by Frank Shimizu [ 27/Nov/19 ] |
|
I'm sorry, it seems I messed up the formatting of the log message and aggregation command, but can't seem to edit the description. |