Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.0-rc6
Affects Version/s: 3.2.0-rc4
Component/s: Text Search
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:

Hide

See above

Show
See above
Sprint:
QuInt D (12/14/15)
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

MongoDB 3.2.0 RC4 appears to have a substantial performance regression with full text searching

Test Data
3000 books obtained from Project Gutenberg (http://web.eecs.umich.edu/~lahiri/gutenberg_dataset.html) stored in MongoDB as follows:

    {
      author : "Abraham Lincoln",
      title : "Letters",
      body : "<full text of book>"
    }

This data was then indexed using an "all fields" index:

db.books.createIndex( { "$**" : "text" } );

This produces a test dataset of around 1.1GB with a text index of 155MB (measured with WT)

Test process
This data was processed into different versions of MongoDB and various simple searches were run using words and phrases of different occurence frequencies in the dataset. This was done using the following, simple query shape in an aggregation pipeline, with the ultimate goal being to report the number of books per author containing the search word:

db.books.aggregate([
{ 
	$match : { 
		$text : { 
			$search : "house" 
		} 
	} 
},
{ 
	$group : { 
		_id : { 
			author : "$author" 
		}, 
		count : { 
			$sum : 1 
		} 
	} 
}, 
{ 
	$sort : { 
		count : -1 
	} 
} ]);

The words used are as follows:

slaveholder
hound
"gigantic hound"
cheese
house

A simple test script ("testQuery_all.js") is attached to automate this process.

Test results
All of these results were taken at the third run (i.e. to ensure that data was as warm as possible). In the case of the 3.2 results, mongod ran one core flat-out for the entire query duration.

Version	Engine	Total Query Duration (ms)
2.6.11	MMAPv1	5308
3.0.7	MMAPv1	5306
3.0.7	WT Snappy	6625
3.2.0 RC4	MMAPv1	26157
3.2.0 RC4	WT Snappy	639862

Full results are available here:
https://goo.gl/s4pU9j

Source data is here:
https://dl.dropboxusercontent.com/u/6076108/books.bson.gz
Note: text index needs to be manually applied to this data:

db.books.createIndex( { "$**" : "text" } );

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

createIndex.js
0.0 kB
Nov 30 2015 10:09:07 AM UTC
testQuery_all.js
1.0 kB
Nov 30 2015 10:09:07 AM UTC

is related to

SERVER-19936 Performance pass on unicode-aware text processing logic (text index v3)

Closed

Assignee:: J Rassi (Inactive)
Reporter:: Stuart Hall
Participants:: Githook User, J Rassi, Martin Bligh, Ramon Fernandez, Stuart Hall
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Nov 30 2015 10:09:06 AM UTC
Updated:: Dec 03 2015 10:29:31 PM UTC
Resolved:: Dec 01 2015 10:41:46 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty