[SERVER-8962] increasing Chinese support of text index Created: 13/Mar/13  Updated: 25/Jun/15  Resolved: 29/Apr/15

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: 2.4.0-rc2
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: jude chang Assignee: Unassigned
Resolution: Duplicate Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-17620 RLP Tokenizer (includes C++ unit tests) Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

My product now need this support, In fact we already using it by PaodingTokenizer and BM25 Algorithm.
But it can only satisfy simple function .If put this on application,it will cause low efficiency. it need do db.**.find(

{documentid:*}

) one by one.
I am sure that it will become very very popular in China if has this funtion.



 Comments   
Comment by maojie chen [ 08/May/13 ]

The following from the reference
During this time by study can search on the Internet to Chinese word segmentation algorithm, it hard to find a fast and accurate segmentation method, through the study of a few days, I finally found a fast and accurate Chinese word segmentation method . Now go back and think about it, think the problem is not very complicated, for general applications, I think this algorithm should be basic enough, of course, is there a way to achieve 100% segmentation accuracy, the algorithm is the same. For practical applications often require on the efficiency and accuracy to be a compromise, of course, the pursuit of technology is endless, and I will continue to strive to further improve the accuracy in subsequent editions, and to maintain the existing efficiency.

Comment by maojie chen [ 08/May/13 ]

Paoding (Paodingjieniu participle) open-source Java-based Chinese word segmentation component
Do you need to rewrite the underlying code?
What about when a new version to support Chinese search?

Comment by maojie chen [ 08/May/13 ]

hello
mongodb use in China has become an increasingly widespread
But the amount of data over one million when the search text, keywords too slow
Search for large amounts of data have to set up another search engine
it is Very inconvenient, long development cycle, the effect is not ideal

“If put this on application,it will cause low efficiency“
It is what causes?
Is because insert or modify construction of the index?
Low efficiency reflected in what areas?

Generated at Thu Feb 08 03:18:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.