[SERVER-62276] Crash on profiler settings changed Created: 27/Dec/21  Updated: 22/May/23  Resolved: 22/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Konstantin Krasnov Assignee: Backlog - Storage Execution Team
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Screen Shot 2022-01-07 at 5.30.13 PM.png    
Assigned Teams:
Storage Execution
Operating System: ALL
Sprint: Execution Team 2022-02-21, Execution Team 2022-03-07
Participants:

 Description   

Hello,

We tried to recreate the table on the PRIMARY node, which resulted in the node crashing:

 

 

{"t":{"$date":"2021-12-27T07:37:04.159+03:00"},"s":"W",  "c":"QUERY",    "id":20478,   "ctx":"conn363849","msg":"getMore command executor error","attr":{"state":"FAILURE","error":{"code":136,"codeName":"CappedPositionLost","errmsg":"CollectionScan died due to failure to restore tailable cursor position. Last seen record id: RecordId(4991)"},"stats":{"stage":"COLLSCAN","filter":{"$and":[{"ts":{"$gt":{"$date":"1970-01-01T00:00:00.000Z"}}},{"ns":{"$regex":"^db1."}},{"ns":{"$not":{"$eq":"db1.system.profile"}}}]},"nReturned":4991,"executionTimeMillisEstimate":9743,"works":2452404,"advanced":4991,"needTime":1223706,"needYield":0,"saveState":1223718,"restoreState":1223718,"failed":true,"isEOF":0,"direction":"forward","docsExamined":4991}}}
{"t":{"$date":"2021-12-27T07:37:04.202+03:00"},"s":"I",  "c":"COMMAND",  "id":48742,   "ctx":"conn363849","msg":"Profiler settings changed","attr":{"from":{"level":1,"slowms":1000,"sampleRate":1.0},"to":{"level":0,"slowms":1000,"sampleRate":1.0}}}
{"t":{"$date":"2021-12-27T07:37:04.363+03:00"},"s":"I",  "c":"COMMAND",  "id":518070,  "ctx":"conn363849","msg":"CMD: drop","attr":{"namespace":"db1.system.profile"}}
{"t":{"$date":"2021-12-27T07:37:04.363+03:00"},"s":"I",  "c":"STORAGE",  "id":23879,   "ctx":"conn363849","msg":"About to abort all index builders","attr":{"namespace":"db1.system.profile","uuid":{"uuid":{"$uuid":"557af432-4858-4eb1-9aac-f6c91e51ca25"}},"reason":"Collection db1.system.profile(557af432-4858-4eb1-9aac-f6c91e51ca25) is being dropped"}}
{"t":{"$date":"2021-12-27T07:37:04.364+03:00"},"s":"I",  "c":"STORAGE",  "id":20318,   "ctx":"conn363849","msg":"Finishing collection drop","attr":{"namespace":"db1.system.profile","uuid":{"uuid":{"$uuid":"557af432-4858-4eb1-9aac-f6c91e51ca25"}}}}
{"t":{"$date":"2021-12-27T07:37:04.367+03:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"thread535824","msg":"Writing fatal message","attr":{"message":"Invalid access at address: 0x98"}}
{"t":{"$date":"2021-12-27T07:37:04.368+03:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"thread535824","msg":"Writing fatal message","attr":{"message":"Got signal: 11 (Segmentation fault).\n"}}
{"t":{"$date":"2021-12-27T07:37:04.472+03:00"},"s":"I",  "c":"STORAGE",  "id":20320,   "ctx":"conn363849","msg":"createCollection","attr":{"namespace":"db1.system.profile","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"0f523efc-3db6-4ab8-8a38-d302d9131b65"}},"options":{"capped":true,"size":2097152}}}
{"t":{"$date":"2021-12-27T07:37:04.481+03:00"},"s":"I",  "c":"COMMAND",  "id":48742,   "ctx":"conn363849","msg":"Profiler settings changed","attr":{"from":{"level":0,"slowms":1000,"sampleRate":1.0},"to":{"level":1,"slowms":1000,"sampleRate":1.0}}}

backtrace:

{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31431,   "ctx":"thread535824","msg":"BACKTRACE: {bt}","attr":{"bt":{"backtrace":[{"a":"55CA37FA83AA","b":"55CA351DF000","o":"2DC93AA","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1EA"},{"a":"55CA37FA9E39","b":"55CA351DF000","o":"2DCAE39","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"55CA37FA730C","b":"55CA351DF000","o":"2DC830C","s":"_ZN5mongo12_GLOBAL__N_124abruptQuitWithAddrSignalEiP9siginfo_tPv","s+":"EC"},{"a":"55CA377E1384","b":"55CA351DF000","o":"2602384","s":"_ZL16WasmFaultHandleriP9siginfo_tPv","s+":"A4"},{"a":"7FA4033B1630","b":"7FA4033A2000","o":"F630","s":"_L_unlock_13","s+":"34"},{"a":"55CA36351E8A","b":"55CA351DF000","o":"1172E8A","s":"__wt_bt_read","s+":"8A"},{"a":"55CA36362313","b":"55CA351DF000","o":"1183313","s":"__page_read","s+":"593"},{"a":"55CA363634FE","b":"55CA351DF000","o":"11844FE","s":"__wt_page_in_func","s+":"3FE"},{"a":"55CA363941EA","b":"55CA351DF000","o":"11B51EA","s":"__tree_walk_internal","s+":"32A"},{"a":"55CA3629C08E","b":"55CA351DF000","o":"10BD08E","s":"__wt_evict_file","s+":"CE"},{"a":"55CA362763D8","b":"55CA351DF000","o":"10973D8","s":"__wt_conn_dhandle_close","s+":"148"},{"a":"55CA3627ECD6","b":"55CA351DF000","o":"109FCD6","s":"__sweep_server","s+":"246"},{"a":"7FA4033A9EA5","b":"7FA4033A2000","o":"7EA5","s":"start_thread","s+":"C5"},{"a":"7FA4030D28DD","b":"7FA402FD4000","o":"FE8DD","s":"clone","s+":"6D"}],"processInfo":{"mongodbVersion":"4.4.9","gitVersion":"b4048e19814bfebac717cf5a880076aa69aba481","compiledModules":[],"uname":{"sysname":"Linux","release":"3.10.0-1127.19.1.el7.x86_64","version":"#1 SMP Tue Aug 25 17:23:54 UTC 2020","machine":"x86_64"},"somap":[{"b":"55CA351DF000","elfType":3,"buildId":"375E0455B64A8CCBA2B20814F34100164730166F"},{"b":"7FA4033A2000","path":"/lib64/libpthread.so.0","elfType":3,"buildId":"2B482B3BAE79DEF4E5BC9791BC6BBDAE0E93E359"},{"b":"7FA402FD4000","path":"/lib64/libc.so.6","elfType":3,"buildId":"D78066A9C36F5FD63E2F6AC851AE3515C4C9792A"}]}}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA37FA83AA","b":"55CA351DF000","o":"2DC93AA","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.606","s+":"1EA"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA37FA9E39","b":"55CA351DF000","o":"2DCAE39","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA37FA730C","b":"55CA351DF000","o":"2DC830C","s":"_ZN5mongo12_GLOBAL__N_124abruptQuitWithAddrSignalEiP9siginfo_tPv","s+":"EC"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA377E1384","b":"55CA351DF000","o":"2602384","s":"_ZL16WasmFaultHandleriP9siginfo_tPv","s+":"A4"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FA4033B1630","b":"7FA4033A2000","o":"F630","s":"_L_unlock_13","s+":"34"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA36351E8A","b":"55CA351DF000","o":"1172E8A","s":"__wt_bt_read","s+":"8A"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA36362313","b":"55CA351DF000","o":"1183313","s":"__page_read","s+":"593"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA363634FE","b":"55CA351DF000","o":"11844FE","s":"__wt_page_in_func","s+":"3FE"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA363941EA","b":"55CA351DF000","o":"11B51EA","s":"__tree_walk_internal","s+":"32A"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA3629C08E","b":"55CA351DF000","o":"10BD08E","s":"__wt_evict_file","s+":"CE"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA362763D8","b":"55CA351DF000","o":"10973D8","s":"__wt_conn_dhandle_close","s+":"148"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"55CA3627ECD6","b":"55CA351DF000","o":"109FCD6","s":"__sweep_server","s+":"246"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FA4033A9EA5","b":"7FA4033A2000","o":"7EA5","s":"start_thread","s+":"C5"}}}
{"t":{"$date":"2021-12-27T07:37:04.719+03:00"},"s":"I",  "c":"CONTROL",  "id":31427,   "ctx":"thread535824","msg":"  Frame: {frame}","attr":{"frame":{"a":"7FA4030D28DD","b":"7FA402FD4000","o":"FE8DD","s":"clone","s+":"6D"}}}
{"t":{"$date":"2021-12-27T08:44:27.173+03:00"},"s":"I",  "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****"}

 



 Comments   
Comment by Irwin Dolobowsky [ 22/May/23 ]

We were unable to reproduce this issue internally.  Please reopen if a consistent repro is available.

Comment by Haley Connelly [ 14/Feb/22 ]

Hi kkrasnov@gmail.com,

I have a couple questions. You mentioned you were trying to recreate a table on the primary.

(1) What steps did you take to recreate a table on the primary? Did you follow specific Mongodb documentation?
(2) There is CappedPositionLost when issuing a getMore. Do you know if the collection scan was over a special type of collection like the oplog, or a .system collection, etc?

Comment by Edwin Zhou [ 07/Jan/22 ]

Hi kkrasnov@gmail.com,

Thank you for uploading the diagnostic data. The operations that preceded the Invalid access segfault does suggest that it may have been related to the profiler. I'm passing this along to the appropriate team for further investigation.

Best,
Edwin

Comment by Konstantin Krasnov [ 06/Jan/22 ]

Hi Edwin,

It happened once. I have not tried repeating this action on this node.

I have uploaded mongod.log and the diagnostic.data files.

 

Best regards,

Konstantin

Comment by Edwin Zhou [ 31/Dec/21 ]

Hi kkrasnov@gmail.com,

Thank you for your report. Does this issue consistently occur whenever you modify the profiler settings or has this only been a one-time occurrence?

Would you please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Best,
Edwin

Generated at Thu Feb 08 05:54:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.