[SERVER-71664] Creating columnstore index crashes the server when the document has duplicated fields Created: 29/Nov/22  Updated: 29/Oct/23  Resolved: 22/Dec/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.3.0-rc0

Type: Bug Priority: Major - P3
Reporter: Irina Yatsenko (Inactive) Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: pm2646-m4
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Insert into a collection a document with duplicated fields like this:

{_id: 0, bad: {c: {obj: 42}, c: [42]}}

(note: I don't know how to do it from the mongo shell but it looks like mongocxx driver can do that, at least, this is what I'm observing when using Genny that uses mongocxx).

coll.createIndex({"$**": "columnstore"}) -> triggers an invariant in ColumnShredder::appendToArrayInfo (and, therefore, crashes the server).

 

invariant(oldIt < oldPosition.end());
oldPosition = {_data = 0x5651220464b0 "{", _size = 1}
newPosition = {_data = 0x7f44d33e5df0 "{[", _size = 2}
std::mismatch returns newIt set to "[" and oldIt set to oldPosition.end()

 

Sprint: QE 2022-12-12, QE 2022-12-26
Participants:

 Description   

 

 



 Comments   
Comment by Dianna Hohensee (Inactive) [ 22/Dec/22 ]

Here's what the commit msg was supposed to be: 

SERVER-71664 Column key generation, return found duplicate field when the arrayInfo of the latest
leaf node std::mismatch with the previous node's arrayInfo is at the beginning of an array.
 
Column key generation creates column values by building up arrayInfo information per leaf, and then
combining information per path. Two leaves with duplicate field names where one leaf is an object
and the other is an array must be caught.

Comment by Githook User [ 22/Dec/22 ]

Author:

{'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@mongodb.com', 'username': 'DiannaHohensee'}

Message: SERVER-71664 Column key generation, return found duplicate field when the arrayInfo of the latest ...
Branch: master
https://github.com/mongodb/mongo/commit/1e666d5a6728c800873165d14fd49aa2ea421ce2

Comment by Dianna Hohensee (Inactive) [ 08/Dec/22 ]

The last thing worth doing is probably to explore the code directly, and see if it makes logical sense that there's an issue we want to correct. I'll do that next.

Comment by Dianna Hohensee (Inactive) [ 08/Dec/22 ]

Just tweaked my patch to run the exact example, without invariant. So definitely need another ingredient.

rs0:PRIMARY> db.mm.drop()
true
rs0:PRIMARY> db.mm.insert({ "_id" : 0})
WriteResult({ "nInserted" : 1 })
rs0:PRIMARY> db.mm.find()
{ "_id" : 0, "bad" : { "c" : { "obj" : 42 } }, "c" : [ 42 ] }
rs0:PRIMARY> db.mm.createIndex({"$**": "columnstore"})
{
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"createdCollectionAutomatically" : false,
	"commitQuorum" : "votingMembers",
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1670521968, 3),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1670521968, 3)
}

Comment by Dianna Hohensee (Inactive) [ 08/Dec/22 ]

I've been trying to reproduce the invariant. However, a document with the same shape doesn't appear to trigger it, in my repro attempt. I opted to hack the insert path in order to get duplicate fields inserted; as well as adding logging of the document in the find command code so that I can see the duplicate fields, which may otherwise be eliminated in the shell results. This is my patch.

So I inserted a random document

rs0:PRIMARY> db.mm.insert({ "_id" : 25, "item" : "ABC3",  dimensions: [ [ "koed", 25 ] ] })
WriteResult({ "nInserted" : 1 })

And logs in the insert path show what I added via CPP code.

{"t":{"$date":"2022-12-08T17:29:35.382+00:00"},"s":"I",  "c":"-",        "id":0,       "ctx":"conn3","msg":"~~~fixDocumentForInsert start","attr":{"doc":"{ _id: 25.0, item: \"ABC3\", dimensions: [ [ \"koed\", 25.0 ] ] }"}}
{"t":{"$date":"2022-12-08T17:29:35.382+00:00"},"s":"I",  "c":"-",        "id":0,       "ctx":"conn3","msg":"~~~fixDocumentForInsert start","attr":{"newDoc":"{ _id: 25.0, item: \"ABC3\", dimensions: [ [ \"koed\", 25.0 ] ], dimensions: \"out-of-this-world\", bad: { c: { obj: 42 } }, c: [ 42 ] }"}}
{"t":{"$date":"2022-12-08T17:29:35.383+00:00"},"s":"I",  "c":"STORAGE",  "id":20320,   "ctx":"conn3","msg":"createCollection","attr":{"namespace":"test.mm","uuidDisposition":"generated","uuid":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"options":{}}}
{"t":{"$date":"2022-12-08T17:29:35.395+00:00"},"s":"I",  "c":"INDEX",    "id":20345,   "ctx":"conn3","msg":"Index build: done building","attr":{"buildUUID":null,"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","index":"_id_","ident":"index-1-3867456837185354573","collectionIdent":"collection-0-3867456837185354573","commitTimestamp":{"$timestamp":{"t":1670520575,"i":1}}}}

Double checked with a find command

rs0:PRIMARY> db.mm.find()
{ "_id" : 25, "item" : "ABC3", "dimensions" : [ [ "koed", 25 ] ], "bad" : { "c" : { "obj" : 42 } }, "c" : [ 42 ] }

{"t":{"$date":"2022-12-08T17:29:41.232+00:00"},"s":"I",  "c":"QUERY",    "id":0,       "ctx":"conn3","msg":"~~~Find cmd: result->getBodyBuilder().asTempObj()","attr":{"result":{"cursor":{"firstBatch":[{"_id":25,"item":"ABC3","dimensions":[["koed",25]],"dimensions":"out-of-this-world","bad":{"c":{"obj":42}},"c":[42]}],"id":0,"ns":"test.mm"}}}}

Then I created the column store index, successfully without invariant.

rs0:PRIMARY> db.mm.createIndex({"$**": "columnstore"})
{
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"createdCollectionAutomatically" : false,
	"commitQuorum" : "votingMembers",
	"ok" : 1,
	"$clusterTime" : {
		"clusterTime" : Timestamp(1670520603, 5),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	},
	"operationTime" : Timestamp(1670520603, 5)
}

{"t":{"$date":"2022-12-08T17:30:03.192+00:00"},"s":"I",  "c":"INDEX",    "id":20438,   "ctx":"conn3","msg":"Index build: registering","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"namespace":"test.mm","collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"indexes":1,"firstIndex":{"name":"$**_columnstore"},"command":{"createIndexes":"mm","v":2,"indexes":[{"key":{"$**":"columnstore"},"name":"$**_columnstore"}],"ignoreUnknownIndexOptions":false}}}
{"t":{"$date":"2022-12-08T17:30:03.208+00:00"},"s":"I",  "c":"INDEX",    "id":20384,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: starting","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","properties":{"v":2,"key":{"$**":"columnstore"},"name":"$**_columnstore"},"specIndex":0,"numSpecs":1,"method":"Hybrid","ident":"index-2-3867456837185354573","collectionIdent":"collection-0-3867456837185354573","maxTemporaryMemoryUsageMB":200}}
{"t":{"$date":"2022-12-08T17:30:03.208+00:00"},"s":"I",  "c":"INDEX",    "id":20346,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: initialized","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","initializationTimestamp":{"$timestamp":{"t":1670520603,"i":2}}}}
{"t":{"$date":"2022-12-08T17:30:03.209+00:00"},"s":"I",  "c":"INDEX",    "id":20440,   "ctx":"conn3","msg":"Index build: waiting for index build to complete","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"deadline":{"$date":{"$numberLong":"9223372036854775807"}}}}
{"t":{"$date":"2022-12-08T17:30:03.209+00:00"},"s":"I",  "c":"STORAGE",  "id":4847600, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: waiting for last optime before interceptors to be majority committed","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"deadline":{"$date":"2022-12-08T17:30:13.208Z"},"timeoutMillis":10000,"lastOpTime":{"ts":{"$timestamp":{"t":1670520603,"i":2}},"t":21}}}
{"t":{"$date":"2022-12-08T17:30:03.306+00:00"},"s":"I",  "c":"INDEX",    "id":20391,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: collection scan done","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","totalRecords":1,"readSource":"kMajorityCommitted","durationMillis":0}}
{"t":{"$date":"2022-12-08T17:30:03.307+00:00"},"s":"I",  "c":"COMMAND",  "id":20685,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: inserted keys from external sorter into index","attr":{"namespace":"test.mm","index":"$**_columnstore","keysInserted":8,"durationMillis":0}}
{"t":{"$date":"2022-12-08T17:30:03.313+00:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"ReplNetwork","msg":"Connecting","attr":{"hostAndPort":"localhost:50000"}}
{"t":{"$date":"2022-12-08T17:30:03.314+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:34834","uuid":{"uuid":{"$uuid":"bdda1cdc-4634-4518-acd2-a9f4e7f53a2a"}},"connectionId":5,"connectionCount":2}}
{"t":{"$date":"2022-12-08T17:30:03.315+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn5","msg":"client metadata","attr":{"remote":"127.0.0.1:34834","client":"conn5","doc":{"driver":{"name":"NetworkInterfaceTL-ReplNetwork","version":"6.2.0"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"18.04"}}}}
{"t":{"$date":"2022-12-08T17:30:03.318+00:00"},"s":"I",  "c":"STORAGE",  "id":3856201, "ctx":"conn5","msg":"Index build: commit quorum satisfied","attr":{"indexBuildEntry":{"_id":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"},"collectionUUID":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"},"commitQuorum":"votingMembers","indexNames":["$**_columnstore"],"commitReadyMembers":["localhost:50000"]}}}
{"t":{"$date":"2022-12-08T17:30:03.319+00:00"},"s":"I",  "c":"CONNPOOL", "id":22566,   "ctx":"ReplNetwork","msg":"Ending connection due to bad connection status","attr":{"hostAndPort":"localhost:50000","error":"CallbackCanceled: onInvoke :: caused by :: Callback was canceled","numOpenConns":0}}
{"t":{"$date":"2022-12-08T17:30:03.319+00:00"},"s":"I",  "c":"STORAGE",  "id":3856203, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: waiting for next action before completing final phase","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}}}}
{"t":{"$date":"2022-12-08T17:30:03.319+00:00"},"s":"I",  "c":"STORAGE",  "id":3856204, "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: received signal","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"action":"Commit quorum Satisfied"}}
{"t":{"$date":"2022-12-08T17:30:03.320+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"127.0.0.1:34840","uuid":{"uuid":{"$uuid":"546bfe2d-f6fd-4aa7-9149-a98b726b1a23"}},"connectionId":7,"connectionCount":3}}
{"t":{"$date":"2022-12-08T17:30:03.320+00:00"},"s":"I",  "c":"INDEX",    "id":20345,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: done building","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","index":"$**_columnstore","ident":"index-2-3867456837185354573","collectionIdent":"collection-0-3867456837185354573","commitTimestamp":{"$timestamp":{"t":1670520603,"i":4}}}}
{"t":{"$date":"2022-12-08T17:30:03.321+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn7","msg":"client metadata","attr":{"remote":"127.0.0.1:34840","client":"conn7","doc":{"driver":{"name":"NetworkInterfaceTL-ReplNetwork","version":"6.2.0"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"18.04"}}}}
{"t":{"$date":"2022-12-08T17:30:03.321+00:00"},"s":"I",  "c":"STORAGE",  "id":20663,   "ctx":"IndexBuildsCoordinatorMongod-0","msg":"Index build: completed successfully","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}},"collectionUUID":{"uuid":{"$uuid":"e627c770-eb34-4532-994b-1c28698d80db"}},"namespace":"test.mm","indexesBuilt":["$**_columnstore"],"numIndexesBefore":1,"numIndexesAfter":2}}
{"t":{"$date":"2022-12-08T17:30:03.321+00:00"},"s":"I",  "c":"INDEX",    "id":20447,   "ctx":"conn3","msg":"Index build: completed","attr":{"buildUUID":{"uuid":{"$uuid":"e8cecae6-55ef-4b61-b084-1e0356391483"}}}}
{"t":{"$date":"2022-12-08T17:30:03.323+00:00"},"s":"I",  "c":"NETWORK",  "id":22944,   "ctx":"conn5","msg":"Connection ended","attr":{"remote":"127.0.0.1:34834","uuid":{"uuid":{"$uuid":"bdda1cdc-4634-4518-acd2-a9f4e7f53a2a"}},"connectionId":5,"connectionCount":2}}
{"t":{"$date":"2022-12-08T17:30:03.325+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn3","msg":"Slow query","attr":{"type":"command","ns":"test.mm","appName":"MongoDB Shell","command":{"createIndexes":"mm","indexes":[{"key":{"$**":"columnstore"},"name":"$**_columnstore"}],"lsid":{"id":{"$uuid":"0d2f3210-e9b4-4c79-98b3-4cf7dfa5fbcd"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1670520575,"i":2}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test"},"numYields":0,"reslen":271,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":5}},"FeatureCompatibilityVersion":{"acquireCount":{"r":4,"w":4}},"ReplicationStateTransition":{"acquireCount":{"w":8}},"Global":{"acquireCount":{"r":4,"w":4}},"Database":{"acquireCount":{"r":2,"w":3}},"Collection":{"acquireCount":{"r":2,"w":2,"W":1}},"Mutex":{"acquireCount":{"r":8}}},"flowControl":{"acquireCount":3,"timeAcquiringMicros":16},"readConcern":{"level":"local","provenance":"implicitDefault"},"writeConcern":{"w":"majority","wtimeout":0,"provenance":"implicitDefault"},"storage":{},"remote":"127.0.0.1:49200","protocol":"op_msg","durationMillis":133}}
{"t":{"$date":"2022-12-08T17:30:03.325+00:00"},"s":"I",  "c":"EXECUTOR", "id":6983000, "ctx":"conn3","msg":"Slow SessionWorkflow loop","attr":{"elapsed":{"totalMillis":22078,"activeMillis":134,"receiveWorkMillis":21943,"processWorkMillis":134,"sendResponseMillis":0,"finalizeMillis":0}}}
{"t":{"$date":"2022-12-08T17:30:04.250+00:00"},"s":"I",  "c":"STORAGE",  "id":22260,   "ctx":"TimestampMonitor","msg":"Removing drop-pending idents with drop timestamps before timestamp","attr":{"timestamp":{"$timestamp":{"t":1670520303,"i":5}}}}
{"t":{"$date":"2022-12-08T17:30:04.250+00:00"},"s":"I",  "c":"STORAGE",  "id":22237,   "ctx":"TimestampMonitor","msg":"Completing drop for ident","attr":{"ident":"internal-3-3867456837185354573","dropTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2022-12-08T17:30:04.251+00:00"},"s":"I",  "c":"STORAGE",  "id":6776600, "ctx":"TimestampMonitor","msg":"The ident was successfully dropped","attr":{"ident":"internal-3-3867456837185354573","dropTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2022-12-08T17:30:36.002+00:00"},"s":"W",  "c":"COMMAND",  "id":7024600, "ctx":"ftdc","msg":"The collStats command is deprecated. For more information, see https://dochub.mongodb.org/core/collStats-deprecated"}

Then just sanity checked with a find command again.

rs0:PRIMARY> db.mm.find()
{ "_id" : 25, "item" : "ABC3", "dimensions" : [ [ "koed", 25 ] ], "bad" : { "c" : { "obj" : 42 } }, "c" : [ 42 ] }

{"t":{"$date":"2022-12-08T17:32:34.681+00:00"},"s":"I",  "c":"QUERY",    "id":0,       "ctx":"conn3","msg":"~~~Find cmd: result->getBodyBuilder().asTempObj()","attr":{"result":{"cursor":{"firstBatch":[{"_id":25,"item":"ABC3","dimensions":[["koed",25]],"dimensions":"out-of-this-world","bad":{"c":{"obj":42}},"c":[42]}],"id":0,"ns":"test.mm"}}}}

So it looks like there might be something more at play to hit that invariant. I'm not sure what it is, though.

Generated at Thu Feb 08 06:19:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.