[SERVER-54296] Invariant failure | aborting after invariant Created: 04/Feb/21  Updated: 29/Oct/23  Resolved: 15/Mar/21

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Stability
Affects Version/s: 4.4.3
Fix Version/s: 4.9.0, 4.4.5

Type: Task Priority: Critical - P2
Reporter: Abdul Moiz Baig Assignee: Arun Banala
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 1111.PNG     PNG File 2222.PNG     PNG File image-2021-02-05-03-31-32-930.png     PNG File image-2021-02-08-16-38-53-529.png    
Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.4
Sprint: Query Execution 2021-02-22, Query Execution 2021-03-08, Query Execution 2021-03-22
Participants:

 Description   

I have recently upgraded my MongoDB server from 4.2.1 to 4.4.3 to utilize some new features. Since then I am consistently facing the issue of mongod service failure.

I have been searching for any solution available for a couple of days but unfortunately couldn't find one. Below are my logs which took out right after my mongod service got failed.

"ctx":"conn17583","msg":"Slow query","attr":{"type":"command","ns":"staging_reflecx_io.org","command":{"aggregate":"org","pipeline":[{"$match":{"level":1,"_id":{"$in":[{"$oid":"5fcf7998b5c4c43ed8bffb85"}]}}},{"$graphLookup":{"from":"org","startWith":"$_id","connectFromField":"_id","connectToField":"parent_id","as":"dealerhierarchy"}},{"$unwind":"$dealerhierarchy"},{"$match":{"dealerhierarchy.level":5}},{"$group":{"_id":null,"count":{"$sum":1},"dealers":{"$push":"$dealerhierarchy._id"}}}],"cursor":{},"$db":"staging_reflecx_io","lsid":{"id":{"$uuid":"1907851c-f708-4dbd-8209-af5cb84f3347"}}},"planSummary":"IXSCAN { _id: 1 }","keysExamined":1,"docsExamined":1,"cursorExhausted":true,"numYields":72,"nreturned":1,"queryHash":"F9571571","planCacheKey":"7EBE716A","reslen":13843,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":95}},"Global":{"acquireCount":{"r":95}},"Database":{"acquireCount":{"r":95}},"Collection":{"acquireCount":{"r":96}},"Mutex":{"acquireCount":{"r":23}}},"storage":{},"protocol":"op_msg","durationMillis":3202}}
{"t":{"$date":"2021-02-04T02:30:12.410-05:00"},"s":"F", "c":"-", "id":23079, "ctx":"conn17454","msg":"Invariant failure","attr":{"expr":"valueSize <= _memoryUsage","file":"src/mongo/db/pipeline/lookup_set_cache.h","line":141}}
{"t":{"$date":"2021-02-04T02:30:12.410-05:00"},"s":"F", "c":"-", "id":23080, "ctx":"conn17454","msg":"\n\n***aborting after invariant() failure\n\n"}
{"t":{"$date":"2021-02-04T02:30:12.410-05:00"},"s":"F", "c":"CONTROL", "id":4757800, "ctx":"conn17454","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}

Can you please explain why this service is getting failed or being killed? As my server was working perfectly fine when I was on 4.2.1



 Comments   
Comment by Abdul Moiz Baig [ 17/Mar/21 ]

Arun,

Thanks for keeping me posted.

Comment by Arun Banala [ 15/Mar/21 ]

abdulmoiz.baig.work@gmail.com We have merged a commit which fixes the reported issue. This fix should be released as part of an upcoming release of 4.4. You can keep an eye on our release page.

Thank you for bringing this bug to our notice!

Comment by Githook User [ 12/Mar/21 ]

Author:

{'name': 'Arun Banala', 'email': 'arun.banala@mongodb.com', 'username': 'banarun'}

Message: SERVER-54296 Fix incorrect cache size estimate logic in lookup_set_cache.h

(cherry picked from commit abf0352882f93e760537d612401eb9fb6e5a030e)
Branch: v4.4
https://github.com/mongodb/mongo/commit/8c766391cc328923c097a7e4ef0692c65d9152ec

Comment by Githook User [ 11/Mar/21 ]

Author:

{'name': 'Arun Banala', 'email': 'arun.banala@mongodb.com', 'username': 'banarun'}

Message: SERVER-54296 Fix incorrect cache size estimate logic in lookup_set_cache.h
Branch: master
https://github.com/mongodb/mongo/commit/abf0352882f93e760537d612401eb9fb6e5a030e

Comment by Abdul Moiz Baig [ 09/Mar/21 ]

Arun,

Thanks. Please keep me posted so I can retry 4.4 again in my environment.

Comment by Arun Banala [ 08/Mar/21 ]

abdulmoiz.baig.work@gmail.com We have identified a bug that could be potentially causing this issue. I'm working on a fix for this. I will post an update when I merge the fix into our master branch.

Comment by Abdul Moiz Baig [ 02/Mar/21 ]

Arun,

Did you find anything? What's causing this issue?

Regards,

Moiz

Comment by Arun Banala [ 19/Feb/21 ]

Thank you for the report, I am looking into this issue. kghazanfar4@gmail.com Are you also using $graphLookup lookup stage? Also is the crash happening consistently?

I've been looking at the $graphLookup code, and it looks like we haven't changed the logic since 4.2. I will continue to investigate this issue, and provide an update soon.

Comment by Ghazanfar Khan [ 19/Feb/21 ]

I am also facing the same issue after upgrading to 4.4.

Comment by Abdul Moiz Baig [ 10/Feb/21 ]

Hello Team,

Can you please tell me how we are moving forward with this and what issue was diagnosed?

Thanks,

Moiz

Comment by Edwin Zhou [ 08/Feb/21 ]

Hi abdulmoiz.baig.work@gmail.com,

Thank you for providing diagnostic data and keeping us updated with new information! I will be assigning this to the appropriate team for further evaluation.

Best,
Edwin

Comment by Abdul Moiz Baig [ 08/Feb/21 ]

Hi Edwin,

Today we faced the same issue again in the morning. So we decided to downgrade from 4.4.3 to 4.4.1 but It went down again. So we are facing this issue on 4.4.1 as well which is probably the major release of 4.4. I have uploaded my today's logs on the support uploader(Zip File name: mongod_08_feb_2SERVER-54296.zip). Screenshot is attached for reference.

Now we are downgrading back to 4.2 release which was working perfectly fine for us.

Let us know if you need anything else.

Kind Regards,

A.Moiz

Comment by Abdul Moiz Baig [ 04/Feb/21 ]

Hi Edwin,

I have uploaded log files for the days I have faced this issue along with the complete diagnostic.data directory on the support uploader as you requested. The screenshot is attached for reference.

Thanks for a quick response. Really looking forward to hearing back from you about the findings. Don't hesitate to ask If you need anything else to investigate the issue.

Kind regards,
A. Moiz

 

Comment by Edwin Zhou [ 04/Feb/21 ]

Hi abdulmoiz.baig.work@gmail.com,

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Kind regards,
Edwin

Comment by Abdul Moiz Baig [ 04/Feb/21 ]

Added two screenshots to show my OS details on the server and the specs as well.

Generated at Thu Feb 08 05:33:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.