[SERVER-17643] Text indexes should work with standard indexes (instead of slowing them down) Created: 18/Mar/15  Updated: 18/Mar/15  Resolved: 18/Mar/15

Status: Closed
Project: Core Server
Component/s: Querying, Text Search
Affects Version/s: 2.6.8, 3.0.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Charlie Stigler Assignee: David Storch
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-17648 Implement matcher for text predicates Backlog
Operating System: ALL
Steps To Reproduce:

1. Add a compound text index and standard ObjectId index into a collection with 1 million objects.
2. Query by standard index, note normal speed.
3. Query by compound text index, note speed (slower but still usable).
4. Query by both at once, note that it is slower than either and will timeout on large enough result sets.

Participants:

 Description   

Currently, text indexes seem to interact very poorly with other indexes. In many cases with low result counts, adding a normally-indexed field will actually slow the search down significantly (which is counterintuitive).

Example in our app:

db.items.find({ indexedField: ObjectId('523ec53087d4200202000006') })
completes in ~1 ms with ~400 results, using the indexedField index

db.items.find({ compoundTextIndex: 'grade' })
completes in ~180 ms with ~70,000 results, using the compoundTextIndex index

db.items.find({ indexedField: ObjectId('523ec53087d4200202000006'), compoundTextIndex: 'grade' })
completes in ~250 ms with ~20 results, using the compoundTextIndex index

This third example clearly doesn't make sense. It would be faster in this case to perform the indexed query and then filter down the results by text at the application level! It should be using the indexedField index instead of compoundTextIndex.

This actually crashes our app sometimes, taking over 30 seconds in real-life queries. So it's pretty major for us.



 Comments   
Comment by David Storch [ 18/Mar/15 ]

Glad I could help. Please watch SERVER-17648 for further updates.

Comment by Charlie Stigler [ 18/Mar/15 ]

Dave, thanks so much, SERVER-17648 is exactly what I was trying to report with this ticket. Sorry I made it tough to understand!

Comment by David Storch [ 18/Mar/15 ]

Hi cstigler,

A query using the $text operator will always execute by looking up matching entries in the collection's text index. The system will never use a non-text index in order to perform a full text search query.

The query

db.items.find({ indexedField: ObjectId('523ec53087d4200202000006'),  $text: { $search: 'grade' } });

will therefore execute by looking up all occurrences of the term "grade", fetching the associated documents, and filtering out the documents with the correct ObjectId for indexedField. This is why adding the $text predicate to the query slows down execution dramatically compared to the apparently similar query

db.items.find({ indexedField: ObjectId('523ec53087d4200202000006') });

The latter query can just look up a small number of values of the ObjectId in the indexedField index.

I have filed a feature request on your behalf to allow $text queries to use non-text indices: see SERVER-17648. For further information on MongoDB's full text search mechanism, please refer to the docs.

Best,
Dave

Comment by Charlie Stigler [ 18/Mar/15 ]

@Jason Sorry about that. There's no simplifications except for changing the property names, which I did a bit overaggressively in that case. Actual queries were:

db.items.find({ $text: { $search: 'grade' } });

db.items.find({ indexedField: ObjectId('523ec53087d4200202000006') });

db.items.find({ indexedField: ObjectId('523ec53087d4200202000006'),  $text: { $search: 'grade' } });

Besides the property name indexedField being changed, this is the exact query with no simplification. This is the mongod slow query log:

query testdb.items query: { $text: { $search: "grade" }, indexedField: ObjectId('523ec53087d4200202000006') } planSummary: TEXT {} ntoreturn:0 ntoskip:0 nscanned:69425 nscannedObjects:138850 keyUpdates:0 writeConflicts:0 numYields:1088 nreturned:23 reslen:312645 locks:{} 1920ms

Comment by J Rassi [ 18/Mar/15 ]

I'd like to clarify exactly what query you're asking about performance for, as I don't see any index defined on the field "compoundTextIndex" from the getIndexes() output provided. I imagine you're trying to provide a simplified view into your application, though now I'm having trouble understanding exactly what the problematic query looks like.

It would be helpful to see an actual unmodified getIndexes() output, along with an actual log excerpt from the mongod slow query log showing the queries in question being run. Could you post these, please?

Comment by Charlie Stigler [ 18/Mar/15 ]

David, thanks so much for the fast reply. Here's the output of db.items.getIndexes(). Note that there were 9 other non-text indexes shown, all standard or compound non-text indexes that aren't relevant to the query I'm testing with. I removed those but am happy to list them if they might be relevant.

[
	{
		"v" : 1,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "testdb.items"
	},
	{
		"v" : 1,
		"key" : {
			"indexedField" : 1
		},
		"name" : "indexedField_1",
		"ns" : "testdb.items",
		"background" : true
	},
	{
		"v" : 1,
		"key" : {
			"_fts" : "text",
			"_ftsx" : 1
		},
		"name" : "prop1_text_prop2_text_prop3_text_prop4_text",
		"ns" : "testdb.items",
		"weights" : {
			"prop3" : 1,
			"prop1" : 5,
			"prop2" : 4,
			"prop4" : 3
		},
		"background" : true,
		"default_language" : "english",
		"language_override" : "language",
		"textIndexVersion" : 2
	}
]

Comment by David Storch [ 18/Mar/15 ]

Hi cstigler,

Thanks for reporting this issue. We will need to gather a bit more information in order to diagnose. Could you please provide the output from running the following:

db.items.getIndexes();

Best,
Dave

Generated at Thu Feb 08 03:45:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.