[SERVER-50769] server restarted after expr:{"expr":"_currentApplyOps.getArrayLength() > 0","file":"src/mongo/db/pipeline/document_source_change_stream_transform.cpp","line":535}} Created: 04/Sep/20 Updated: 29/Oct/23 Resolved: 12/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 4.4.0 |
| Fix Version/s: | 4.9.0, 4.2.12, 4.4.4 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | jeason chan | Assignee: | Justin Seyster |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||
| Sprint: | Query 2020-11-30, Query 2020-12-14, Query 2020-12-28, Query 2021-01-11, Query 2021-01-25 | ||||||||||||
| Participants: | |||||||||||||
| Case: | (copied to CRM) | ||||||||||||
| Description |
|
We used an application named monstache with mongo-go-driver v1.4.1 to watch change streams from replica ret with mongods of v4.4.0. And the whole system works well for a few weeks but the primary mongod restarted frequently after we started to use transaction with java-go-driver. We tried the following conditions to check the cause:
However sometimes the mongod restarted all of a sudden.
Below are the logs and the key words are "SERVER RESTARTED" and the "ctx":"connXXX" which is generated by mongo-go-driver used by monstache and the printed BACKTRACE and Invariant failure. |
| Comments |
| Comment by Githook User [ 12/Jan/21 ] | |||
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: (cherry picked from commit e9122ba5078eca4fbc7ea858221dba6af00e90a9) | |||
| Comment by Githook User [ 12/Jan/21 ] | |||
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: (cherry picked from commit e9122ba5078eca4fbc7ea858221dba6af00e90a9) | |||
| Comment by Githook User [ 12/Jan/21 ] | |||
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: | |||
| Comment by Bernard Gorman [ 08/Sep/20 ] | |||
|
Hey jeasonchanupup@gmail.com, thank you for bringing this issue to our attention. If you could answer the following questions, it will really help our investigation: 1. In your original description, you say that you're using monstache to "watch change streams from a replica set", but you are running in a sharded cluster. Is monstache opening a change stream through the mongoS, or is it opening change streams directly on each shard? 2. What kind of change stream (single-collection, whole-database, or whole-cluster) are you running? From the documentation here it seems that monstache "defaults to opening the change stream against the entire deployment." Did you configure it to run against a single database or single collection instead? If you have any logs that show the actual $changeStream command, please post them here. 3. Could you provide some examples of the kind of transactions your application is running on this cluster? 4. Has your oplog rolled over yet? If not, then the oplog entries which caused this issue should still be present - it would be very helpful if we could examine them. From the logs, it appears that the earliest incident occurred at 2020-09-03T10:00:46.229+08:00:
Assuming that the change stream was keeping pace with the rate of oplog generation, it is very likely that the oplog entry which caused this issue was written shortly before this time. The following command will use the mongo shell to connect to the host and will dump 50 seconds of its oplog around the time of the incident to a logfile (please note that you may need to add appropriate authentication credentials to this command):
Please do not post this log on the ticket, since it may contain some sensitive information. We have created a secure upload portal for you at this link; please upload the oplog file there instead. Files uploaded to this portal are visible only to MongoDB employees, and are routinely deleted after some time. Thanks! | |||
| Comment by jeason chan [ 08/Sep/20 ] | |||
|
Hi, if more details are required, leave a messga and I will reply as soon as possible. Thx. | |||
| Comment by jeason chan [ 04/Sep/20 ] | |||
|
Here are the resource limit params in the systemd service: #file size
And our hosts have 64 cores and 64gb ram and 1T ssd |