[SERVER-28880] Issue in Text Search Index Created: 20/Apr/17  Updated: 27/Oct/23  Resolved: 26/Apr/17

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: sachin garg Assignee: Mark Agarunov
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hi there,

I am facing an issue in Text Search.

My Text Search Index is

{
        "v" : 1,
        "key" : {
            "_fts" : "text",
            "_ftsx" : 1
        },
        "name" : "ms.products",
        "ns" : "rajeshtesh.ms.products",
        "weights" : {
            "brand" : 50,
            "categories" : 30,
            "collections" : 20,
            "description" : 4,
            "name" : 500
        },
        "default_language" : "english",
        "language_override" : "language",
        "textIndexVersion" : 3
 }

And I am posting a query to search idli word in it

db.ms.products.find({ publish: '1', '$text': { '$search': "idli" }}, {name: 1, score: { '$meta': 'textScore' }}).sort({score : { '$meta': 'textScore' }})

*Query Result is : *

*{ "_id" : ObjectId("58824b8e254a2a001496fb9b"), "name" : "ID WHEAT PAROTA", "score" : 388.3333333333333 }*
{ "_id" : ObjectId("58843604254a2a0014988bcb"), "name" : "ID SPECIAL KERALA PAROTA", "score" : 367.5 }
{ "_id" : ObjectId("589c29b1f62aa8912582737c"), "name" : "ID SPECIAL IDLY/DOSA BATTER", "score" : 355 }
{ "_id" : ObjectId("58834a82254a2a001497bb14"), "name" : "VIJAY IDLI RAVA", "score" : 333.3333333333333 }
{ "_id" : ObjectId("58824c4a254a2a0014971373"), "name" : "BHAGYALAKSHMI IDLI SOOJI", "score" : 333.3333333333333 }
{ "_id" : ObjectId("58824cb8254a2a0014971ea8"), "name" : "24 MANTRA IDLI RAVA", "score" : 312.5 }
{ "_id" : ObjectId("588f08916a5bc29c3b0e9600"), "name" : "iFILL IDLI RICE REGULAR", "score" : 312.5 }

Full Content of First Record

{
    "_id" : ObjectId("58824b8e254a2a001496fb9b"),
    "name" : "ID WHEAT PAROTA",
    "price" : 69.3,
    "compare_price" : 70,
    "brand" : "ID",
    "sku" : "339286",
    "barcode" : "339286",
    "categories" : [
        "food-essentials",
        "ready-to-eat",
        "cook-eat-meals"
    ],
    "publish" : "1",
    "weight" : "1",
    "inventory_management" : "automatic",
    "product_has_multiple_variants" : "Product",
    "inventory_allow_out_of_stock" : "0",
    "inventory_quantity" : 0,
    "inventory_low_stock_quantity" : 0,
    "option_set" : "58c4f7e6a72619a53c3f0969",
    "_metadata" : {
        "option_set" : {
            "action" : "automatic"
        }
    },
    "images" : [
        {
            "image" : "ms.products/58824b8e254a2a001496fb9b/images/58824b8e254a2a001496fb9c/58824b8eac9b955813b4c873/58824b8eac9b955813b4c873.jpg",
            "caption" : "",
            "tags" : "",
            "_id" : "58824b8e254a2a001496fb9c",
            "_metadata" : {
                "image" : {
                    "_id" : "58824b8eac9b955813b4c873",
                    "name" : "339286.jpg",
                    "tmp_path" : "/tmp/tmp-4952tVZ6rI71862n.jpg",
                    "type" : "image/jpeg"
                }
            }
        }
    ],
    "variants" : [
        {
            "price" : "69.3",
            "compare_price" : "70",
            "sku" : "339286",
            "barcode" : "339286",
            "weight" : "1",
            "inventory_management" : "automatic",
            "inventory_allow_out_of_stock" : "0",
            "inventory_quantity" : "0",
            "options" : [
                {
                    "name" : "Weight",
                    "value" : "300 Grams"
                }
            ],
            "variant_id" : "300 Grams",
            "id" : 66336
        }
    ],
    "options" : [
        {
            "name" : "Weight",
            "values" : [
                "300 Grams"
            ],
            "_id" : "58f87e4ba0f774d3140d1947"
        }
    ],
    "default_variant" : {
        "price" : "69.3",
        "compare_price" : "70",
        "sku" : "339286",
        "barcode" : "339286",
        "weight" : "1",
        "inventory_management" : "automatic",
        "inventory_allow_out_of_stock" : "0",
        "inventory_quantity" : "0",
        "options" : [
            {
                "name" : "Weight",
                "value" : "300 Grams"
            }
        ],
        "variant_id" : "300 Grams",
        "id" : 66336
    },
    "available" : 0,
    "attributes" : [
        {
            "name" : "_brand",
            "value" : "ID",
            "group" : "default",
            "_id" : "58f87e4ba0f774d3140d1948"
        }
    ],
    "uniquesku" : [
        "339286"
    ],
    "approve" : "approved",
    "seller" : "58537f24bf9b36aa0700a022",
    "alias" : "id-wheat-parota",
    "sort_order" : 306,
    "created_on" : ISODate("2017-01-20T17:40:35.957Z"),
    "updated_on" : ISODate("2017-04-20T09:24:27.781Z"),
    "_updated_by" : "585b7f4ba7636cc77ca96861",
    "inventory_management_level" : "variant",
    "collections" : [ ],
    "SEO" : {
        
    },
    "features" : [ ],
    "files" : [ ],
    "description" : "",
    "Request" : "Monday",
    "metafields" : {
        "loyalty_points" : "1"
    }
}

So my question is if that there is no idli word in first record. but still it is showing that record in result on first place.

Thanks
Sachin Garg



 Comments   
Comment by Mark Agarunov [ 26/Apr/17 ]

Hello sachin65IT,

Thank you for providing this output. Looking over this, it appears that word stemming is in fact the cause of the behavior. According to the output of explain(true), the term "idli" is being stemmed to "id":

.queryPlanner.winningPlan.inputStage.inputStage.inputStage.inputStage.parsedTextQuery {
  "terms": [
    "id"
  ],
  "negatedTerms": [],
  "phrases": [],
  "negatedPhrases": []
}

A workaround to this behavior would be to wrap the term in quotes - "idli" instead of idli - which will cause the term to be treated as a phrase and will not be stemmed.

Thanks,
Mark

Comment by sachin garg [ 21/Apr/17 ]

Hi Mark,

Thanks for your reply.

Am providing you the output of the same query with explain true :

{
    "queryPlanner" : {
        "plannerVersion" : 1,
        "namespace" : "rajeshtesh.ms.products",
        "indexFilterSet" : false,
        "parsedQuery" : {
            "$and" : [
                {
                    "publish" : {
                        "$eq" : "1"
                    }
                },
                {
                    "$text" : {
                        "$search" : "idli",
                        "$language" : "english",
                        "$caseSensitive" : false,
                        "$diacriticSensitive" : false
                    }
                }
            ]
        },
        "winningPlan" : {
            "stage" : "PROJECTION",
            "transformBy" : {
                "name" : 1,
                "score" : {
                    "$meta" : "textScore"
                }
            },
            "inputStage" : {
                "stage" : "SORT",
                "sortPattern" : {
                    "score" : {
                        "$meta" : "textScore"
                    }
                },
                "inputStage" : {
                    "stage" : "SORT_KEY_GENERATOR",
                    "inputStage" : {
                        "stage" : "FETCH",
                        "filter" : {
                            "publish" : {
                                "$eq" : "1"
                            }
                        },
                        "inputStage" : {
                            "stage" : "TEXT",
                            "indexPrefix" : {
                                
                            },
                            "indexName" : "ms.products",
                            "parsedTextQuery" : {
                                "terms" : [
                                    "id"
                                ],
                                "negatedTerms" : [ ],
                                "phrases" : [ ],
                                "negatedPhrases" : [ ]
                            },
                            "inputStage" : {
                                "stage" : "TEXT_MATCH",
                                "inputStage" : {
                                    "stage" : "TEXT_OR",
                                    "inputStage" : {
                                        "stage" : "IXSCAN",
                                        "keyPattern" : {
                                            "_fts" : "text",
                                            "_ftsx" : 1
                                        },
                                        "indexName" : "ms.products",
                                        "isMultiKey" : true,
                                        "isUnique" : false,
                                        "isSparse" : false,
                                        "isPartial" : false,
                                        "indexVersion" : 1,
                                        "direction" : "backward",
                                        "indexBounds" : {
                                            
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        },
        "rejectedPlans" : [ ]
    },
    "serverInfo" : {
        "host" : "trial-db2",
        "port" : 27017,
        "version" : "3.2.11",
        "gitVersion" : "009580ad490190ba33d1c6253ebd8d91808923e4"
    },
    "ok" : 1
}

Thanks
Sachin

Comment by Mark Agarunov [ 20/Apr/17 ]

Hello sachin65IT,

Thank you for the report. Looking over the output you've provided, I believe this may be due to word stemming in the text search. To get some more information, could you execute the find query with explain(true) appended and provide the output? For example:

db.ms.products.find({ publish: '1', '$text': { '$search': "idli" }}, {name: 1, score: { '$meta': 'textScore' }}).explain(true)

This should provide some more insight into how the query is being performed.

Thanks,
Mark

Generated at Thu Feb 08 04:19:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.