[SERVER-54921] Crash during concurrent ordered bulk insert into timeseries collection Created: 03/Mar/21  Updated: 29/Oct/23  Resolved: 04/Mar/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Louis Williams Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-54792 Improve test coverage for insertMany ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2021-03-08
Participants:

 Description   

I wrote an FSM workload to do bulk inserts on a time-series collection and hit a segfault:

[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:50.770+0000 | 2021-03-03T17:11:50.764+00:00 F  CONTROL  4757800 [conn2601] "Writing fatal message","attr":{"message":"Invalid access at address: 0"}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.932+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21924284C2","b":"7F2192239000","o":"1EF4C2","s":"_ZN5mongo18stack_trace_detail12_GLOBAL__N_119printStackTraceImplERKNS1_7OptionsEPNS_14StackTraceSinkE.constprop.630","s+":"212"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.932+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F2192429ED9","b":"7F2192239000","o":"1F0ED9","s":"_ZN5mongo15printStackTraceEv","s+":"29"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.932+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21924255F3","b":"7F2192239000","o":"1EC5F3","s":"abruptQuitWithAddrSignal","s+":"F3"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.932+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F2186ABD6B6","b":"7F218603C000","o":"A816B6","s":"_ZL16WasmFaultHandleriP9siginfo_tPv","s+":"B6"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F219084AD80","b":"7F2190838000","o":"12D80","s":"funlockfile","s+":"50"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21923C36A3","b":"7F2192239000","o":"18A6A3","s":"_ZN5mongo15BSONObjIteratorC2ERKNS_7BSONObjE","s+":"3"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21923BF8E6","b":"7F2192239000","o":"1868E6","s":"_ZNK5mongo7BSONObj8getFieldENS_10StringDataE","s+":"36"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F2189A9268A","b":"7F2189A77000","o":"1B68A","s":"_ZN5mongo13BucketCatalog6insertEPNS_16OperationContextERKNS_15NamespaceStringERKNS_7BSONObjE","s+":"99A"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F2187422C1C","b":"7F2187316000","o":"10CC1C","s":"_ZZNK5mongo12_GLOBAL__N_19CmdInsert10Invocation33_performUnorderedTimeseriesWritesEPNS_16OperationContextEmmPSt6vectorINS_7BSONObjESaIS6_EEPN5boost8optionalINS_4repl6OpTimeEEEPNSB_INS_3OIDEEERKNSB_IS5_ImSaImEEEEENKUlmE_clEm","s+":"9C"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F218742382C","b":"7F2187316000","o":"10D82C","s":"_ZNK5mongo12_GLOBAL__N_19CmdInsert10Invocation33_performUnorderedTimeseriesWritesEPNS_16OperationContextEmmPSt6vectorINS_7BSONObjESaIS6_EEPN5boost8optionalINS_4repl6OpTimeEEEPNSB_INS_3OIDEEERKNSB_IS5_ImSaImEEEE","s+":"16C"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F2187426608","b":"7F2187316000","o":"110608","s":"_ZNK5mongo12_GLOBAL__N_19CmdInsert10Invocation30_performTimeseriesWritesSubsetEPNS_16OperationContextEmmPSt6vectorINS_7BSONObjESaIS6_EEPN5boost8optionalINS_4repl6OpTimeEEEPNSB_INS_3OIDEEE","s+":"188"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F2187426E10","b":"7F2187316000","o":"110E10","s":"_ZN5mongo12_GLOBAL__N_19CmdInsert10Invocation8typedRunEPNS_16OperationContextE","s+":"5D0"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21874275AB","b":"7F2187316000","o":"1115AB","s":"_ZN5mongo12TypedCommandINS_12_GLOBAL__N_19CmdInsertEE14InvocationBase3runEPNS_16OperationContextEPNS_3rpc21ReplyBuilderInterfaceE","s+":"3B"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F218B9D5D7F","b":"7F218B999000","o":"3CD7F","s":"_ZN5mongo14CommandHelpers20runCommandInvocationEPNS_16OperationContextERKNS_12OpMsgRequestEPNS_17CommandInvocationEPNS_3rpc21ReplyBuilderInterfaceE","s+":"7F"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F218B9DB1EE","b":"7F218B999000","o":"421EE","s":"_ZN5mongo14CommandHelpers20runCommandInvocationESt10shared_ptrINS_23RequestExecutionContextEES1_INS_17CommandInvocationEENS_9transport15ServiceExecutor14ThreadingModelE","s+":"1BE"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21899CA515","b":"7F218999C000","o":"2E515","s":"_ZN5mongo12_GLOBAL__N_120runCommandInvocationESt10shared_ptrINS_23RequestExecutionContextEES1_INS_17CommandInvocationEE","s+":"A5"}}
[ReplicaSetFixture:job0:primary] 2021-03-03T17:11:51.933+0000 | 2021-03-03T17:11:51.932+00:00 I  CONTROL  31445   [conn2601] "Frame","attr":{"frame":{"a":"7F21899D97B1","b":"7F218999C000","o":"3D7B1","s":"_ZN5mongo12_GLOBAL__N_114RunCommandImpl11_runCommandEv","s+":"161"}}



 Comments   
Comment by Githook User [ 04/Mar/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-54921 Fix index out-of-bounds error for bulk ordered time-series inserts
Branch: master
https://github.com/mongodb/mongo/commit/68695f96302deff9157af0f822dfa0b48c0bcbc0

Comment by Louis Williams [ 03/Mar/21 ]

In the commit path, updates that need to be retried as inserts are identified by their absolute index into the set of documents to insert.

When the insert is later retried, that index is assumed to be relative to "start": https://github.com/mongodb/mongo/blob/37dc18b365d3279df10f812e0e388bf08acb8e60/src/mongo/db/commands/write_commands/write_commands.cpp#L716

This leads to an out-of-bounds lookup on the set of documents.

Generated at Thu Feb 08 05:34:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.