Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8423

Text search case folding needs utf-8 support

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.3.2
    • Fix Version/s: 3.1.7
    • Component/s: Text Search
    • Labels:
      None
    • Backwards Compatibility:
      Major Change
    • Sprint:
      Platform 6 07/17/15, Platform 8 08/28/15, Platform 7 08/10/15

      Description

      e.g. for Russian queries, "Как" currently lowercases to itself, whereas it should lowercase to "как".

      Needed for stopword removal, matching, etc.

      > db.foo.insert({content:"Как дела?"})
      > db.foo.ensureIndex({content:"text"},{default_language:"russian"})
      > db.foo.runCommand("text",{search:"\"как дела\""})
      {
      	"queryDebugString" : "дел||||как дела||",
      	"language" : "russian",
      	"results" : [ ],
      	"stats" : {
      		"nscanned" : 0,
      		"nscannedObjects" : 0,
      		"n" : 0,
      		"nfound" : 0,
      		"timeMicros" : 104
      	},
      	"ok" : 1
      }
      > db.foo.runCommand("text",{search:"\"Как дела\""})
      {
      	"queryDebugString" : "Как|дел||||Как дела||",
      	"language" : "russian",
      	"results" : [
      		{
      			"score" : 1,
      			"obj" : {
      				"_id" : ObjectId("510aa82ddb47733460b47eff"),
      				"content" : "Как дела?"
      			}
      		}
      	],
      	"stats" : {
      		"nscanned" : 1,
      		"nscannedObjects" : 0,
      		"n" : 1,
      		"nfound" : 1,
      		"timeMicros" : 118
      	},
      	"ok" : 1
      }
      > 

        Attachments

          Issue Links

            Activity

              People

              Votes:
              18 Vote for this issue
              Watchers:
              24 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: