[SERVER-12449] Query planner generates incorrect plan for predicate on prefix field of compound text index Created: 23/Jan/14  Updated: 11/Jul/16  Resolved: 10/Feb/14

Status: Closed
Project: Core Server
Component/s: Querying, Text Search
Affects Version/s: 2.5.4
Fix Version/s: 2.6.0-rc0

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: hari.khalsa@10gen.com
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to DOCS-2689 pending Text Search features Closed
Operating System: ALL
Participants:

 Description   

The query planner selects the text index for queries targeting the prefix field of a compound text index. Since text indexes generate no index keys for documents without text data, results for these queries will omit documents with no indexed text data.

> db.foo.insert({a:17})
Insert WriteResult({ "ok" : 1, "n" : 1 })
> db.foo.ensureIndex({a:1,b:"text"})
> db.foo.count({a:17})
0

Explain output below:

> db.foo.find({a:17}).explain()
{
	"cursor" : "BtreeCursor a_1_b_text",
	"isMultiKey" : false,
	"n" : 0,
	"nscannedObjects" : 0,
	"nscanned" : 0,
	"nscannedObjectsAllPlans" : 0,
	"nscannedAllPlans" : 0,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 1,
	"indexBounds" : {
		"a" : [
			[
				17,
				17
			]
		],
		"_fts" : [
			[
				{
					"$minElement" : 1
				},
				{
					"$maxElement" : 1
				}
			]
		],
		"_ftsx" : [
			[
				{
					"$minElement" : 1
				},
				{
					"$maxElement" : 1
				}
			]
		]
	},
	"server" : "Rassi-MacBook-Pro.local:27017",
	"stats" : {
		"type" : "FETCH",
		"works" : 1,
		"yields" : 0,
		"unyields" : 0,
		"invalidates" : 0,
		"advanced" : 0,
		"needTime" : 0,
		"needFetch" : 0,
		"isEOF" : 1,
		"alreadyHasObj" : 0,
		"forcedFetches" : 0,
		"matchTested" : 0,
		"children" : [
			{
				"type" : "IXSCAN",
				"works" : 1,
				"yields" : 0,
				"unyields" : 0,
				"invalidates" : 0,
				"advanced" : 0,
				"needTime" : 0,
				"needFetch" : 0,
				"isEOF" : 1,
				"keyPattern" : "{ a: 1.0, _fts: \"text\", _ftsx: 1 }",
				"bounds" : {
					"a" : [
						[
							17,
							17
						]
					],
					"_fts" : [
						[
							{
								"$minElement" : 1
							},
							{
								"$maxElement" : 1
							}
						]
					],
					"_ftsx" : [
						[
							{
								"$minElement" : 1
							},
							{
								"$maxElement" : 1
							}
						]
					]
				},
				"isMultiKey" : 0,
				"yieldMovedCursor" : 0,
				"dupsTested" : 0,
				"dupsDropped" : 0,
				"seenInvalidated" : 0,
				"matchTested" : 0,
				"keysExamined" : 0,
				"children" : [ ]
			}
		]
	}
}
>

Verbose query log output below:

2014-01-22T20:56:33.476-0500 [conn1] Running query on new system: ns=test.foo limit=0 skip=0
Tree: a == 17.0
Sort: {}
Proj: {}
2014-01-22T20:56:33.476-0500 [conn1] =============================
Beginning planning, options = INCLUDE_COLLSCAN
Canonical query:
ns=test.foo limit=0 skip=0
Tree: a == 17.0
Sort: {}
Proj: {}
 
=============================
2014-01-22T20:56:33.476-0500 [conn1] idx 0 is kp: { _id: 1 } io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "test.foo" }
2014-01-22T20:56:33.476-0500 [conn1] idx 1 is kp: { a: 1.0, _fts: "text", _ftsx: 1 } io: { v: 1, key: { a: 1.0, _fts: "text", _ftsx: 1 }, name: "a_1_b_text", ns: "test.foo", weights: { b: 1 }, default_language: "english", language_override: "language", textIndexVersion: 2 }
2014-01-22T20:56:33.476-0500 [conn1] predicate over field a
2014-01-22T20:56:33.476-0500 [conn1] relevant idx 0 is kp: { a: 1.0, _fts: "text", _ftsx: 1 } io: { v: 1, key: { a: 1.0, _fts: "text", _ftsx: 1 }, name: "a_1_b_text", ns: "test.foo", weights: { b: 1 }, default_language: "english", language_override: "language", textIndexVersion: 2 }
2014-01-22T20:56:33.476-0500 [conn1] rated tree
2014-01-22T20:56:33.476-0500 [conn1] a == 17.0 First: 0 notFirst: full path: a
 
2014-01-22T20:56:33.476-0500 [conn1] enumerator received root:
a == 17.0 First: 0 notFirst: full path: a
 
2014-01-22T20:56:33.476-0500 [conn1] Tagging memoID 0
2014-01-22T20:56:33.476-0500 [conn1] Enumerator: memo right before moving:
2014-01-22T20:56:33.476-0500 [conn1] [Node #0]: predicate, first indices: [0], pred: a == 17.0
 indexToAssign: 0
2014-01-22T20:56:33.476-0500 [conn1] Enumerator: memo right after moving:
2014-01-22T20:56:33.476-0500 [conn1] [Node #0]: predicate, first indices: [0], pred: a == 17.0
 indexToAssign: 0
2014-01-22T20:56:33.476-0500 [conn1] about to build solntree from tagged tree:
a == 17.0  || Selected Index #0 pos 0
 
2014-01-22T20:56:33.477-0500 [conn1] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ _fts: 1, _ftsx: 1 }, { a: 1 }, { a: 1, _fts: 1 }, { a: 1, _fts: 1, _ftsx: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { a: 1.0, _fts: "text", _ftsx: 1 }
---------direction = 1
---------bounds = field #0['a']: [17.0, 17.0], field #1['_fts']: [MinKey, MaxKey], field #2['_ftsx']: [MinKey, MaxKey]
---------fetched = 0
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ _fts: 1, _ftsx: 1 }, { a: 1 }, { a: 1, _fts: 1 }, { a: 1, _fts: 1, _ftsx: 1 }, ]
 
2014-01-22T20:56:33.477-0500 [conn1] Planner: outputted 1 indexed solutions.
2014-01-22T20:56:33.477-0500 [conn1] Planner: outputting a collscan:
2014-01-22T20:56:33.477-0500 [conn1] COLLSCAN
---ns = test.foo
--- filter = a == 17.0
---fetched = 1
---sortedByDiskLoc = 0
---getSort = []
 
2014-01-22T20:56:33.477-0500 [conn1] scoring plan 0:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ _fts: 1, _ftsx: 1 }, { a: 1 }, { a: 1, _fts: 1 }, { a: 1, _fts: 1, _ftsx: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { a: 1.0, _fts: "text", _ftsx: 1 }
---------direction = 1
---------bounds = field #0['a']: [17.0, 17.0], field #1['_fts']: [MinKey, MaxKey], field #2['_ftsx']: [MinKey, MaxKey]
---------fetched = 0
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ _fts: 1, _ftsx: 1 }, { a: 1 }, { a: 1, _fts: 1 }, { a: 1, _fts: 1, _ftsx: 1 }, ]
2014-01-22T20:56:33.478-0500 [conn1] score (2) = baseScore (1) + productivity(0) + noFetchBonus(1)
2014-01-22T20:56:33.478-0500 [conn1] score = 2
2014-01-22T20:56:33.478-0500 [conn1] scoring plan 1:
COLLSCAN
---ns = test.foo
--- filter = a == 17.0
---fetched = 1
---sortedByDiskLoc = 0
---getSort = []
2014-01-22T20:56:33.478-0500 [conn1] score (2) = baseScore (1) + productivity(0) + noFetchBonus(1)
2014-01-22T20:56:33.478-0500 [conn1] score = 2
2014-01-22T20:56:33.478-0500 [conn1] Winning solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ _fts: 1, _ftsx: 1 }, { a: 1 }, { a: 1, _fts: 1 }, { a: 1, _fts: 1, _ftsx: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { a: 1.0, _fts: "text", _ftsx: 1 }
---------direction = 1
---------bounds = field #0['a']: [17.0, 17.0], field #1['_fts']: [MinKey, MaxKey], field #2['_ftsx']: [MinKey, MaxKey]
---------fetched = 0
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ _fts: 1, _ftsx: 1 }, { a: 1 }, { a: 1, _fts: 1 }, { a: 1, _fts: 1, _ftsx: 1 }, ]
 
2014-01-22T20:56:33.478-0500 [conn1] not caching runner but returning 0 results



 Comments   
Comment by Githook User [ 10/Feb/14 ]

Author:

{u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

Message: SERVER-12449 check correct container's .end()
Branch: master
https://github.com/mongodb/mongo/commit/cc12c45a1053ab03db2ecc04c6396514a829733b

Comment by Githook User [ 10/Feb/14 ]

Author:

{u'username': u'hkhalsa', u'name': u'Hari Khalsa', u'email': u'hkhalsa@10gen.com'}

Message: SERVER-12449 text indices require all prefix fields (which must be EQ)
Branch: master
https://github.com/mongodb/mongo/commit/b572251d0452e6d3f9f8ed309b09a99e58b3bf5c

Comment by J Rassi [ 23/Jan/14 ]

As in the example, the index {a:1, b: "text"} generates no index keys for the document {a:1}. The same statement holds for the geo index {b: "2d", a: 1} – it's unclear to me as to whether this is desirable behavior (fixing this for geo indexes was proposed in SERVER-11149, perhaps we should consider this for text indexes too?).

In any case, this fact imposes the restriction that the plan enumerator cannot enumerate any plan that selects a text index {a_1: 1, a_2: 1, ..., a_N: 1, b: "text", ...} unless all of the following conditions hold:

  1. the match expression contains a TEXT node
  2. the TEXT node is a child of an AND node
  3. for all elements a_i, the AND node must have a child EQ node on the path a_i
Generated at Thu Feb 08 03:28:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.