[SERVER-73597] Sporadic seg faults days/weeks after mongod server running Created: 03/Feb/23  Updated: 27/Feb/23  Resolved: 27/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Amit Gupta Assignee: Yuan Fang
Resolution: Done Votes: 0
Labels: bug, crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-73805 mongos in 6.0.4 hangs under load Closed
Assigned Teams:
Server Triage
Operating System: ALL
Steps To Reproduce:

unable to reproduce.  it happens somewhat randomly (it seems like the servers are pretty busy when it happens though)

Participants:

 Description   

Environment:

  • Ubuntu 20.04 latest patch
  • MongoDB 5.0.14
  • 128 GB ram server

 

Hi, we have experienced random crashes on 2 different servers. Below is the stack trace from mongod.log. I have the crash files from /var/crash (they are about 16GB each.  I can put them on a private SFTP server for analysis).

 

Each server is a member of a 3-node replica set (and part of a sharded cluster). I was wondering if anyone has any clues on what is causing the seg fault. Each server has 128GB RAM with 16 threads (Intel Xeon E2288G CPU @ 3.70Ghz).

At the time of crash, they seem to be pretty busy.

 

 

crash log #1

{"t":

{"$date":"2023-01-30T12:47:32.140-05:00"}

,"s":"F", "c":"CONTROL", "id":6384300, "ctx":"initandlisten","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x9ae8c\n"}} {"t":

{"$date":"2023-01-30T12:47:32.140-05:00"}

,"s":"F", "c":"CONTROL", "id":6384300, "ctx":"initandlisten","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31380, "ctx":"initandlisten","msg":"BACKTRACE","attr":{"bt":{"backtrace":[

{"a":"55D2950F40A5","b":"55D2911E4000","o":"3F100A5","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"}

,

{"a":"55D2950F6B29","b":"55D2911E4000","o":"3F12B29","s":"_ZN5mongo15printStackTraceEv","s+":"29"}

,

{"a":"55D2950EF09C","b":"55D2911E4000","o":"3F0B09C","s":"abruptQuitWithAddrSignal","s+":"EC"}

,

{"a":"7F8A6A9F8420","b":"7F8A6A9E4000","o":"14420","s":"funlockfile","s+":"60"}

,

{"a":"7F8A6A9F3376","b":"7F8A6A9E4000","o":"F376","s":"pthread_cond_wait","s+":"216"}

,

{"a":"55D29529B76C","b":"55D2911E4000","o":"40B776C","s":"_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE","s+":"C"}

,

{"a":"55D2950EA987","b":"55D2911E4000","o":"3F06987","s":"_ZN5mongo15waitForShutdownEv","s+":"107"}

,

{"a":"55D29274CB91","b":"55D2911E4000","o":"1568B91","s":"_ZN5mongo12_GLOBAL__N_114_initAndListenEPNS_14ServiceContextEi.isra.1929","s+":"13E1"}

,

{"a":"55D29274E5AF","b":"55D2911E4000","o":"156A5AF","s":"_ZN5mongo11mongod_mainEiPPc","s+":"CDF"}

,

{"a":"55D2925E2F2E","b":"55D2911E4000","o":"13FEF2E","s":"main","s+":"E"}

,

{"a":"7F8A6A816083","b":"7F8A6A7F2000","o":"24083","s":"__libc_start_main","s+":"F3"}

,

{"a":"55D2927489DE","b":"55D2911E4000","o":"15649DE","s":"_start","s+":"2E"}

],"processInfo":{"mongodbVersion":"5.0.14","gitVersion":"1b3b0073a0b436a8a502b612f24fb2bd572772e5","compiledModules":[],"uname":

{"sysname":"Linux","release":"5.4.0-137-generic","version":"#154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023","machine":"x86_64"}

,"somap":[

{"b":"55D2911E4000","elfType":3,"buildId":"44AD2830EB7E90ABFF5F592CAAA6392F81AEC690"}

,

{"b":"7F8A6A9E4000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"7B4536F41CDAA5888408E82D0836E33DCF436466"}

,

{"b":"7F8A6A7F2000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"1878E6B475720C7C51969E69AB2D276FAE6D1DEE"}

]}}}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D2950F40A5","b":"55D2911E4000","o":"3F100A5","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D2950F6B29","b":"55D2911E4000","o":"3F12B29","s":"_ZN5mongo15printStackTraceEv","s+":"29"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D2950EF09C","b":"55D2911E4000","o":"3F0B09C","s":"abruptQuitWithAddrSignal","s+":"EC"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"7F8A6A9F8420","b":"7F8A6A9E4000","o":"14420","s":"funlockfile","s+":"60"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"7F8A6A9F3376","b":"7F8A6A9E4000","o":"F376","s":"pthread_cond_wait","s+":"216"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D29529B76C","b":"55D2911E4000","o":"40B776C","s":"_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE","s+":"C"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D2950EA987","b":"55D2911E4000","o":"3F06987","s":"_ZN5mongo15waitForShutdownEv","s+":"107"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D29274CB91","b":"55D2911E4000","o":"1568B91","s":"_ZN5mongo12_GLOBAL__N_114_initAndListenEPNS_14ServiceContextEi.isra.1929","s+":"13E1"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D29274E5AF","b":"55D2911E4000","o":"156A5AF","s":"_ZN5mongo11mongod_mainEiPPc","s+":"CDF"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D2925E2F2E","b":"55D2911E4000","o":"13FEF2E","s":"main","s+":"E"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"7F8A6A816083","b":"7F8A6A7F2000","o":"24083","s":"__libc_start_main","s+":"F3"}

}} {"t":

{"$date":"2023-01-30T12:47:32.226-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"55D2927489DE","b":"55D2911E4000","o":"15649DE","s":"_start","s+":"2E"}

}}

crash log #2

{"t":

{"$date":"2023-02-01T11:55:24.317-05:00"}

,"s":"F", "c":"CONTROL", "id":6384300, "ctx":"initandlisten","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x3dfff\n"}} {"t":

{"$date":"2023-02-01T11:55:24.317-05:00"}

,"s":"F", "c":"CONTROL", "id":6384300, "ctx":"initandlisten","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31380, "ctx":"initandlisten","msg":"BACKTRACE","attr":{"bt":{"backtrace":[

{"a":"56098E17A0A5","b":"56098A26A000","o":"3F100A5","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"}

,

{"a":"56098E17CB29","b":"56098A26A000","o":"3F12B29","s":"_ZN5mongo15printStackTraceEv","s+":"29"}

,

{"a":"56098E17509C","b":"56098A26A000","o":"3F0B09C","s":"abruptQuitWithAddrSignal","s+":"EC"}

,

{"a":"7FD7CF902420","b":"7FD7CF8EE000","o":"14420","s":"funlockfile","s+":"60"}

,

{"a":"7FD7CF8FD376","b":"7FD7CF8EE000","o":"F376","s":"pthread_cond_wait","s+":"216"}

,

{"a":"56098E32176C","b":"56098A26A000","o":"40B776C","s":"_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE","s+":"C"}

,

{"a":"56098E170987","b":"56098A26A000","o":"3F06987","s":"_ZN5mongo15waitForShutdownEv","s+":"107"}

,

{"a":"56098B7D2B91","b":"56098A26A000","o":"1568B91","s":"_ZN5mongo12_GLOBAL__N_114_initAndListenEPNS_14ServiceContextEi.isra.1929","s+":"13E1"}

,

{"a":"56098B7D45AF","b":"56098A26A000","o":"156A5AF","s":"_ZN5mongo11mongod_mainEiPPc","s+":"CDF"}

,

{"a":"56098B668F2E","b":"56098A26A000","o":"13FEF2E","s":"main","s+":"E"}

,

{"a":"7FD7CF720083","b":"7FD7CF6FC000","o":"24083","s":"__libc_start_main","s+":"F3"}

,

{"a":"56098B7CE9DE","b":"56098A26A000","o":"15649DE","s":"_start","s+":"2E"}

],"processInfo":{"mongodbVersion":"5.0.14","gitVersion":"1b3b0073a0b436a8a502b612f24fb2bd572772e5","compiledModules":[],"uname":

{"sysname":"Linux","release":"5.4.0-137-generic","version":"#154-Ubuntu SMP Thu Jan 5 17:03:22 UTC 2023","machine":"x86_64"}

,"somap":[

{"b":"56098A26A000","elfType":3,"buildId":"44AD2830EB7E90ABFF5F592CAAA6392F81AEC690"}

,

{"b":"7FD7CF8EE000","path":"/lib/x86_64-linux-gnu/libpthread.so.0","elfType":3,"buildId":"7B4536F41CDAA5888408E82D0836E33DCF436466"}

,

{"b":"7FD7CF6FC000","path":"/lib/x86_64-linux-gnu/libc.so.6","elfType":3,"buildId":"1878E6B475720C7C51969E69AB2D276FAE6D1DEE"}

]}}}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098E17A0A5","b":"56098A26A000","o":"3F100A5","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.361","s+":"215"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098E17CB29","b":"56098A26A000","o":"3F12B29","s":"_ZN5mongo15printStackTraceEv","s+":"29"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098E17509C","b":"56098A26A000","o":"3F0B09C","s":"abruptQuitWithAddrSignal","s+":"EC"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"7FD7CF902420","b":"7FD7CF8EE000","o":"14420","s":"funlockfile","s+":"60"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"7FD7CF8FD376","b":"7FD7CF8EE000","o":"F376","s":"pthread_cond_wait","s+":"216"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098E32176C","b":"56098A26A000","o":"40B776C","s":"_ZNSt18condition_variable4waitERSt11unique_lockISt5mutexE","s+":"C"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098E170987","b":"56098A26A000","o":"3F06987","s":"_ZN5mongo15waitForShutdownEv","s+":"107"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098B7D2B91","b":"56098A26A000","o":"1568B91","s":"_ZN5mongo12_GLOBAL__N_114_initAndListenEPNS_14ServiceContextEi.isra.1929","s+":"13E1"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098B7D45AF","b":"56098A26A000","o":"156A5AF","s":"_ZN5mongo11mongod_mainEiPPc","s+":"CDF"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098B668F2E","b":"56098A26A000","o":"13FEF2E","s":"main","s+":"E"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"7FD7CF720083","b":"7FD7CF6FC000","o":"24083","s":"__libc_start_main","s+":"F3"}

}} {"t":

{"$date":"2023-02-01T11:55:24.398-05:00"}

,"s":"I", "c":"CONTROL", "id":31445, "ctx":"initandlisten","msg":"Frame","attr":{"frame":

{"a":"56098B7CE9DE","b":"56098A26A000","o":"15649DE","s":"_start","s+":"2E"}

}}



 Comments   
Comment by Yuan Fang [ 27/Feb/23 ]

I am closing this ticket as requested.

Comment by Amit Gupta [ 27/Feb/23 ]

Hi, sorry, we upgraded our whole infrastructure to 6.0.4, so we don't have any more info on that.  Feel free to close the ticket out.

Comment by Yuan Fang [ 27/Feb/23 ]

Hi amit.gupta@opensense.com,

We still need additional information to diagnose the problem. If this is still an issue for you, could you provide the thread apply all bt (thread apply all backtrace) output from GDB? This will give us a more complete picture of the state of the program at the time of the issue.

Thank you!
Yuan

Comment by Yuan Fang [ 09/Feb/23 ]

Hi amit.gupta@opensense.com,

I have reviewed the FTDC and log, but unfortunately, I have not been able to determine the root cause of the crash. No evidence of CPU, disk, or memory bottlenecks was observed during the crashes.

The information in the backtrace of the mongod.log is not sufficient to diagnose the problem. If feasible, could you provide the thread apply all bt (thread apply all backtrace) output from GDB? This will give us a more complete picture of the state of the program at the time of the issue.

Thank you!
Yuan

Comment by Amit Gupta [ 03/Feb/23 ]

just uploaded the files you mentioned.  I'm also uploading the crash file as well just in case.  It will take a little while to transfer because of the size.

Comment by Amit Gupta [ 03/Feb/23 ]

Yes, I can.. SHould I also include the /var/crash/_usr_bin_mongod.121.crash  file as well?  It's 16GB.

Comment by Yuan Fang [ 03/Feb/23 ]

Hi amit.gupta@opensense.com,

Thank you for reporting the issue. To diagnose the cause of the segfault, would you please archive (tar or zip) mongod.log files leading up to this crash and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Regards,
Yuan

Generated at Thu Feb 08 06:25:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.