[SERVER-13987] Access an Index as a collection. Created: 19/May/14  Updated: 06/Dec/22

Status: Open
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: features we're not sure of

Type: New Feature Priority: Minor - P4
Reporter: John Page Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

I would like the ability to query an index, specifically FTS indexes but also normal ones.

Whilst covered indexes resolve some use cases - I'd like the ability to run a regular expression query over an index, and also get a set of unique values back. For example - to query for all words in the FTS index matching a regex (and skipping over the index values with the same entry) so a REGEX for /.one./ on the index to get Jones,Bones,Phones and then that can become a FTS query to enable fuzzy search.



 Comments   
Comment by Laszlo Balogh [ 09/Sep/14 ]

From the UX perspective, "Did you mean"/fuzzy search feature is an absolute must have. One way to make a user leave a website, if she can see no result on the search results page because of a spelling error. (According to Greg Nudleman's book http://www.amazon.co.uk/Designing-Search-Strategies-Ecommerce-UXmatters/dp/0470942231)

We are considering to stop using MongoDB for full text search and adding ElasticSearch to our stack, just because its fuzzy search capability. Obviously this move will add extra complexity to our system, we have to store data two places, we have to deal with network latency, etc.

However by having access to the terms in the inverted index, one can easily implement fuzzy search with MongoDB.

Comment by John Page [ 20/May/14 ]

Pluggable indexing would be good for this too - although that's probably a big job.

Comment by John Page [ 20/May/14 ]

You could alos think of it as a poor mans column store - especially if we had some way of run-lenght encoding key runs

So if the first node in the tree with Value X has a count (counting indexes) and a pointer to the last node with that value we can walk the tree way faster - would coudl also !! compact the indexes by not storing the key value where it's identical, have a special (use previous) value that was only a byte long.

Generated at Thu Feb 08 03:33:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.