[SERVER-61546] MongoDB 5.1.0 Panic During Disable Encryption at Rest Created: 16/Nov/21  Updated: 27/Oct/23  Resolved: 21/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Eric Tsai Assignee: Spencer Jackson
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Operating System: ALL
Sprint: Security 2021-11-29, Security 2021-12-13, Security 2021-12-27, Security 2022-01-10, Security 2022-01-24
Participants:

 Description   

{"t":{"$date":"2021-11-16T17:50:12.491+00:00"},"s":"I",  "c":"STORAGE",  "id":22324,   "ctx":"conn325","msg":"Closing WiredTiger in preparation for reconfiguring","attr":{"closeConfig":"leak_memory=true,"}}
{"t":{"$date":"2021-11-16T17:50:12.493+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"conn325","msg":"WiredTiger message","attr":{"message":"[1637085012:493596][23161:0x7f776d11b700], close_ckpt: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 76, snapshot max: 76 snapshot count: 0, oldest timestamp: (1637084706, 1) , meta checkpoint timestamp: (1637085006, 1) base write gen: 123"}}
{"t":{"$date":"2021-11-16T17:50:12.520+00:00"},"s":"I",  "c":"STORAGE",  "id":4795905, "ctx":"conn325","msg":"WiredTiger closed","attr":{"durationMillis":29}}
{"t":{"$date":"2021-11-16T17:50:12.527+00:00"},"s":"W",  "c":"NETWORK",  "id":23190,   "ctx":"conn325","msg":"Failed to connect to remote host","attr":{"remoteSocketAddress":"127.0.0.1","remoteSocketAddressPort":35696,"call":"checking socket for error after poll","error":"Connection refused"}}
{"t":{"$date":"2021-11-16T17:50:12.527+00:00"},"s":"E",  "c":"STORAGE",  "id":24248,   "ctx":"conn325","msg":"Unable to retrieve key","attr":{"keyId":".system","error":{"code":2,"codeName":"BadValue","errmsg":"Could not connect to KMIP server 127.0.0.1:35696"}}}
{"t":{"$date":"2021-11-16T17:50:12.528+00:00"},"s":"F",  "c":"ASSERT",   "id":23083,   "ctx":"conn325","msg":"Invariant failure","attr":{"expr":"wiredtiger_open( _path.c_str(), _eventHandler.getWtEventHandler(), _wtOpenConfig.c_str(), &_conn)","error":"BadValue: 22: Invalid argument","file":"src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp","line":787}}
{"t":{"$date":"2021-11-16T17:50:12.529+00:00"},"s":"F",  "c":"ASSERT",   "id":23084,   "ctx":"conn325","msg":"\n\n***aborting after invariant() failure\n\n"}
{"t":{"$date":"2021-11-16T17:50:12.529+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"conn325","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
{"t":{"$date":"2021-11-16T17:50:12.898+00:00"},"s":"I",  "c":"CONTROL",  "id":31380,   "ctx":"conn325","msg":"BACKTRACE","attr":{"bt":{"backtrace":[{"a":"5568175FFA2A","b":"5568135CE000","o":"4031A2A","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.358","s+":"20A"},{"a":"556817602399","b":"5568135CE000","o":"4034399","s":"_ZN5mongo15printStackTraceEv","s+":"29"},{"a":"5568175FAE06","b":"5568135CE000","o":"402CE06","s":"abruptQuit","s+":"66"},{"a":"7F779FB51630","b":"7F779FB42000","o":"F630","s":"_L_unlock_13","s+":"34"},{"a":"7F779F7AA387","b":"7F779F774000","o":"36387","s":"gsignal","s+":"37"},{"a":"7F779F7ABA78","b":"7F779F774000","o":"37A78","s":"abort","s+":"148"},{"a":"556814A45A45","b":"5568135CE000","o":"1477A45","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j","s+":"145"},{"a":"55681469C736","b":"5568135CE000","o":"10CE736","s":"_ZN5mongo18WiredTigerKVEngine13cleanShutdownEv.cold.1676","s+":"23A"},{"a":"556815AF04A1","b":"5568135CE000","o":"25224A1","s":"_ZN5mongo17StorageEngineImpl13cleanShutdownEv","s+":"E1"},{"a":"556815AFA9CB","b":"5568135CE000","o":"252C9CB","s":"_ZN5mongo12_GLOBAL__N_134shutdownGlobalStorageEngineCleanlyEPNS_14ServiceContextENS_6StatusE","s+":"2B"},{"a":"5568147603A0","b":"5568135CE000","o":"11923A0","s":"_ZN5mongo34shutdownGlobalStorageEngineCleanlyEPNS_14ServiceContextE","s+":"63"},{"a":"556814BD75D7","b":"5568135CE000","o":"16095D7","s":"_ZN5mongo12_GLOBAL__N_112shutdownTaskERKNS_16ShutdownTaskArgsE","s+":"1F47"},{"a":"5568175F6695","b":"5568135CE000","o":"4028695","s":"_ZN5mongo12_GLOBAL__N_18runTasksESt5stackINS_15unique_functionIFvRKNS_16ShutdownTaskArgsEEEESt5dequeIS7_SaIS7_EEES5_","s+":"A5"},{"a":"556814A47F0F","b":"5568135CE000","o":"1479F0F","s":"_ZN5mongo8shutdownENS_8ExitCodeERKNS_16ShutdownTaskArgsE","s+":"5F1"},{"a":"5568146F25AE","b":"5568135CE000","o":"11245AE","s":"_ZN5mongo15shutdown_detail14finishShutdownEbxNS_8DurationISt5ratioILl1ELl1000EEEE.cold.287","s+":"BA"},{"a":"556814FABA13","b":"5568135CE000","o":"19DDA13","s":"_ZN5mongo12TypedCommandINS_12_GLOBAL__N_117CmdShutdownMongoDEE14InvocationBase3runEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","s+":"D3"},{"a":"55681633E0BF","b":"5568135CE000","o":"2D700BF","s":"_ZN5mongo14CommandHelpers20runCommandInvocationEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationEPNS_3rpc21ReplyBuilderInterfaceE","s+":"7F"},{"a":"556816341FFE","b":"5568135CE000","o":"2D73FFE","s":"_ZN5mongo14CommandHelpers20runCommandInvocationESt10shared_ptrINS_23RequestExecutionContextEES1_INS_17CommandInvocationEENS_9transport15ServiceExecutor14ThreadingModelE","s+":"1AE"},{"a":"556814C0E4A7","b":"5568135CE000","o":"16404A7","s":"_ZN5mongo12_GLOBAL__N_120runCommandInvocationESt10shared_ptrINS_23RequestExecutionContextEES1_INS_17CommandInvocationEE","s+":"97"},{"a":"556814C1C956","b":"5568135CE000","o":"164E956","s":"_ZN5mongo12_GLOBAL__N_114RunCommandImpl11_runCommandEv","s+":"146"},{"a":"556814C1D97B","b":"5568135CE000","o":"164F97B","s":"_ZN5mongo12_GLOBAL__N_114RunCommandImpl8_runImplEv","s+":"8B"},{"a":"556814C0F45C","b":"5568135CE000","o":"164145C","s":"_ZN5mongo12_GLOBAL__N_114RunCommandImpl3runEv","s+":"13C"},{"a":"556814C10E81","b":"5568135CE000","o":"1642E81","s":"_ZN5mongo12_GLOBAL__N_119ExecCommandDatabase12_commandExecEv","s+":"331"},{"a":"556814C195AD","b":"5568135CE000","o":"164B5AD","s":"_ZZN5mongo12_GLOBAL__N_114executeCommandESt10shared_ptrINS0_13HandleRequest16ExecutionContextEEENUlvE0_clEv","s+":"45D"},{"a":"556814C1B7A0","b":"5568135CE000","o":"164D7A0","s":"_ZZN5mongo15unique_functionIFvPNS_14future_details15SharedStateBaseEEE8makeImplIZNS1_10FutureImplINS1_8FakeVoidEE16makeContinuationIvZZNOS9_4thenIZNS_12_GLOBAL__N_114executeCommandESt10shared_ptrINSC_13HandleRequest16ExecutionContextEEEUlvE0_EEDaOT_ENKUlvE1_clEvEUlPNS1_15SharedStateImplIS8_EESN_E_EENS7_ISI_EEOT0_EUlS3_E_EEDaSJ_EN12SpecificImpl4callEOS3_","s+":"80"},{"a":"556814BDC547","b":"5568135CE000","o":"160E547","s":"_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv","s+":"147"},{"a":"556814C1EFA2","b":"5568135CE000","o":"1650FA2","s":"_ZN5mongo14future_details10FutureImplINS0_8FakeVoidEE11generalImplIZNOS3_17propagateResultToEPNS0_15SharedStateImplIS2_EEEUlOS2_E_ZNOS3_17propagateResultToES7_EUlONS_6StatusEE0_ZNOS3_17propagateResultToES7_EUlvE1_EEDaOT_OT0_OT1_","s+":"152"},{"a":"556814C1372C","b":"5568135CE000","o":"164572C","s":"_ZZN5mongo15unique_functionIFvPNS_14future_details15SharedStateBaseEEE8makeImplIZNS1_10FutureImplINS1_8FakeVoidEE16makeContinuationIvZZNOS9_4thenIZNS_12_GLOBAL__N_114executeCommandESt10shared_ptrINSC_13HandleRequest16ExecutionContextEEEUlvE_EEDaOT_ENKUlvE1_clEvEUlPNS1_15SharedStateImplIS8_EESN_E_EENS7_ISI_EEOT0_EUlS3_E_EEDaSJ_EN12SpecificImpl4callEOS3_","s+":"AC"},{"a":"556814BDC547","b":"5568135CE000","o":"160E547","s":"_ZN5mongo14future_details15SharedStateBase20transitionToFinishedEv","s+":"147"},{"a":"556814C1A06A","b":"5568135CE000","o":"164C06A","s":"_ZN5mongo12_GLOBAL__N_114executeCommandESt10shared_ptrINS0_13HandleRequest16ExecutionContextEE","s+":"20A"},{"a":"556814C1ABD5","b":"5568135CE000","o":"164CBD5","s":"_ZN5mongo12_GLOBAL__N_116receivedCommandsESt10shared_ptrINS0_13HandleRequest16ExecutionContextEE","s+":"2F5"},{"a":"556814C1B6E6","b":"5568135CE000","o":"164D6E6","s":"_ZN5mongo12_GLOBAL__N_115CommandOpRunner3runEv","s+":"36"},{"a":"556814C15CAF","b":"5568135CE000","o":"1647CAF","s":"_ZN5mongo23ServiceEntryPointCommon13handleRequestEPNS_16OperationContextERKNS_7MessageESt10unique_ptrIKNS0_5HooksESt14default_deleteIS8_EE","s+":"39F"},{"a":"556814C0779F","b":"5568135CE000","o":"163979F","s":"_ZN5mongo23ServiceEntryPointMongod13handleRequestEPNS_16OperationContextERKNS_7MessageE","s+":"4F"},{"a":"556814C58167","b":"5568135CE000","o":"168A167","s":"_ZN5mongo9transport19ServiceStateMachine4Impl14processMessageEv","s+":"127"},{"a":"556814C5A2C6","b":"5568135CE000","o":"168C2C6","s":"_ZN5mongo9transport19ServiceStateMachine4Impl12startNewLoopERKNS_6StatusE","s+":"76"},{"a":"556814C5A70F","b":"5568135CE000","o":"168C70F","s":"_ZZN5mongo15unique_functionIFvNS_6StatusEEE8makeImplIZNS_9transport19ServiceStateMachine4Impl15scheduleNewLoopES1_EUlS1_E_EEDaOT_EN12SpecificImpl4callEOS1_","s+":"7F"},{"a":"556816E024DD","b":"5568135CE000","o":"38344DD","s":"_ZZN5mongo15unique_functionIFvNS_6StatusEEE8makeImplIZNS_9transport26ServiceExecutorSynchronous18runOnDataAvailableERKSt10shared_ptrINS5_7SessionEES3_EUlS1_E_EEDaOT_EN12SpecificImpl4callEOS1_","s+":"3D"},{"a":"5568151C2C73","b":"5568135CE000","o":"1BF4C73","s":"_ZZN5mongo15unique_functionIFvvEE8makeImplIZNS_9transport15ServiceExecutor8scheduleENS0_IFvNS_6StatusEEEEEUlvE_EEDaOT_EN12SpecificImpl4callEv","s+":"33"},{"a":"556816E025FB","b":"5568135CE000","o":"38345FB","s":"_ZZN5mongo15unique_functionIFvvEE8makeImplIZNS_9transport26ServiceExecutorSynchronous12scheduleTaskES2_NS4_15ServiceExecutor13ScheduleFlagsEEUlvE0_EEDaOT_EN12SpecificImpl4callEv","s+":"BB"},{"a":"556816E0643C","b":"5568135CE000","o":"383843C","s":"_ZZN5mongo15unique_functionIFvvEE8makeImplIZNS_25launchServiceWorkerThreadES2_EUlvE2_EEDaOT_EN12SpecificImpl4callEv","s+":"5C"},{"a":"556816E064AC","b":"5568135CE000","o":"38384AC","s":"_ZN5mongo12_GLOBAL__N_17runFuncEPv","s+":"1C"},{"a":"7F779FB49EA5","b":"7F779FB42000","o":"7EA5","s":"start_thread","s+":"C5"},{"a":"7F779F8729FD","b":"7F779F774000","o":"FE9FD","s":"clone","s+":"6D"}],"processInfo":{"mongodbVersion":"5.1.0","gitVersion":"2454bbc8c20c8d6d6d9590a66e939d568b9e7222","compiledModules":["enterprise"],"uname":{"sysname":"Linux","release":"3.10.0-1160.41.1.el7.x86_64","version":"#1 SMP Tue Aug 31 14:52:47 UTC 2021","machine":"x86_64"},"somap":[{"b":"5568135CE000","elfType":3,"buildId":"5B4CDDE1CA638903B7FC921C1A1871E1E2A96FC9"},{"b":"7F779FB42000","path":"/lib64/libpthread.so.0","elfType":3,"buildId":"E10CC8F2B932FC3DAEDA22F8DAC5EBB969524E5B"},{"b":"7F779F774000","path":"/lib64/libc.so.6","elfType":3,"buildId":"A317B42B15368ADCAE21C11107691A03EC91059D"}]}}}}
{"t":{"$date":"2021-11-16T17:50:12.898+00:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"conn325","msg":"Frame","attr":{"frame":{"a":"5568175FFA2A","b":"5568135CE000","o":"4031A2A","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.358","s+":"20A"}}}

This test seems to be failing while trying to disable encryption at rest. While consulting with dennis.kuczynski he indicated that this would require server and agent jira tickets to investigate closer.

 



 Comments   
Comment by Spencer Jackson [ 21/Jan/22 ]

I'm going to resolve this as Gone Away. If it comes back, let's capture the logs and we can analyse this more carefully.

Comment by Eric Tsai [ 21/Jan/22 ]

Sorry, forgot to respond to this. I don't have the logs for this run unfortunately.

Comment by Dennis Kuczynski [ 05/Jan/22 ]

spencer.jackson The test was running this set of steps: https://github.com/10gen/mms/blob/dab5673881d9b82690b1b1bcb01f8b453189ee71/server/src/features/resources/nds/encryptionatrest.feature#L39

Each scenario only runs with a single version of MongoDB, so there shouldn't be data files from an earlier version of MongoDB. Although this test performed several agent storage engine changes, where it deletes the DB files and syncs from another node, before it eventually failed when trying to perform a storage engine change where it disables encryption.

Unfortunately it looks like the test log artifacts for these initial runs have expired, and it looks like most recent runs have been successful. eric.tsai do you happen to still have agent/MongoDB logs for this failure?

Comment by Eric Tsai [ 30/Dec/21 ]

For additional context:

We noticed this was failing when preparing for an MMS release. It failed 2x in a row before passing again as expected.
https://evergreen.mongodb.com/task/mms_v20211117_e2e_enterprise_E2E_NDS_ENCRYPTION_AT_REST_CD_Azure_eaf8fdc4fc772aa80b94420bba502df422608ec7_21_11_15_22_30_55/0

In both failures it was failing at the same step when trying to disable encryption at rest on our side. If you look at the cukes steps it's creating a CD (rapid release) cluster which at the time I believe it was just 5.1.x (unsure of the actual version).

As for the data files question I'll defer to dennis.kuczynski

Generated at Thu Feb 08 05:52:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.