[SERVER-38669] $text index defined on positional array element does not return all results Created: 17/Dec/18  Updated: 27/Dec/23

Status: Backlog
Project: Core Server
Component/s: Text Search
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chris Harris Assignee: Backlog - Query Integration
Resolution: Unresolved Votes: 0
Labels: mql-semantics, qi-text-search, query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Integration
Operating System: ALL
Steps To Reproduce:

Consider the following two documents:

> db.c.find({'arr.0.str':'abc'}).pretty()
{
	"_id" : 1,
	"arr" : [
		{
			"num" : 123,
			"str" : "abc"
		},
		{
			"num" : 789,
			"str" : "xyz"
		}
	]
}
{
	"_id" : 3,
	"arr" : {
		"0" : {
			"num" : 123,
			"str" : "abc"
		},
		"1" : {
			"num" : 789,
			"str" : "xyz"
		}
	}
} 

In the first one, arr is an array with two entries.  In the second, arr is just an object containing subdocuments.  As the query shows, the ambiguity of "0" results in both documents matching.

Creating a text index on 'arr.0.str' yields only the embedded document as a matching result:

> db.c.createIndex({'arr.0.str':'text'})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
> db.c.find({$text:{$search:'abc'}}).pretty()
{
	"_id" : 3,
	"arr" : {
		"0" : {
			"num" : 123,
			"str" : "abc"
		},
		"1" : {
			"num" : 789,
			"str" : "xyz"
		}
	}
} 
>

Removing the positional aspect of the field in the index definition results in the array document being returned as expected:

> db.c.dropIndex("arr.0.str_text")
{ "nIndexesWas" : 3, "ok" : 1 }
> db.c.createIndex({'arr.str':'text'})
{
	"createdCollectionAutomatically" : false,
	"numIndexesBefore" : 2,
	"numIndexesAfter" : 3,
	"ok" : 1
}
> db.c.find({$text:{$search:'abc'}}).pretty()
...
{
	"_id" : 1,
	"arr" : [
		{
			"num" : 123,
			"str" : "abc"
		},
		{
			"num" : 789,
			"str" : "xyz"
		}
	]
}
>

Participants:

 Description   

text index defined on on 'arr.0.str' does not return the following document for the query find({$text: {$search:'abc'} }):

{
	"_id" : 1,
	"arr" : [
		{
			"num" : 123,
			"str" : "abc"
		},
		{
			"num" : 789,
			"str" : "xyz"
		}
	]
} 



 Comments   
Comment by Jacob Evans [ 22/Dec/18 ]

When full text search moved from version 1 to version 2 FTSSpec::scoreDocument() changed implementations. The newer implementation uses a custom BSON iterator: FTSElementIterator. This iterator considers array paths only by their base names: https://github.com/mongodb/mongo/blob/bdac7ced24f9ad8f9afac9c57e7184b1f2bf61b2/src/mongo/db/fts/fts_element_iterator.cpp#L117 This causes the comparison to incorrectly occur between a path base (such as "arr") and a full path (such as "arr.0.str"). In addition the iterator appears unable to even descend into an array to be able to compare subobjects here . As christopher.harris suggested, the bad iterator behavior happens when the index is built. The result is that the index does not contain keys for the documents selected by explicit array traversal.

Comment by David Storch [ 18/Dec/18 ]

jacob.evans could you take a look at what's going on here and post any findings in a public comment on this ticket? Thanks!

Generated at Thu Feb 08 04:49:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.