Text search: dutch stemmer not working?

XMLWordPrintableJSON

    • Query Integration
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Hi all,

      I'm using MongoDB text search, and I'd like to give some feedback. I'm not sure what the best way is to do so, so I've made this report. If there's a more preferred way, please let me know, so I can use that way in the future.

      Based on this document: http://docs.mongodb.org/manual/tutorial/create-text-index-on-multi-language-collection/, I've made some testcase, and I don't understand what's happening.

      This is my test data:

      { "_id" : 1, "language" : "portuguese", "quote" : "A sorte protege os audazes" }
      { "_id" : 2, "language" : "spanish", "quote" : "Nada hay más surreal que la realidad." }
      { "_id" : 3, "language" : "english", "quote" : "is this a dagger which I see before me" }
      { "_id" : 4, "language" : "dutch", "quote" : "is dit een dolk die ik voor mij zie" }
      { "_id" : 5, "language" : "dutch", "quote" : "vol verbijstering zaten de dames naar de twee honden te kijken" }
      

      And I'm most interested in finding the Dutch results right now.

      It seems like the stemmer is not working for some words:

      > db.quotes.runCommand( "text", { search: "honden", language:"dutch" } )
      Correct result: 1 (queryDebugString: 'hond')
      > db.quotes.runCommand( "text", { search: "hond", language:"dutch" } )
      Correct result: 1 (queryDebugString: 'hond')
       db.quotes.runCommand( "text", { search: "dames", language:"dutch" } )
      Correct result: 1 (queryDebugString: 'dames')
       db.quotes.runCommand( "text", { search: "dame", language:"dutch" } )
      Incorrect result: 0 (queryDebugString: 'dam')
      

      Note that the plural for hond ('dog') is honden (dogs)
      The plural for dame ('lady') is dames (ladies)

      However, MongoDB text search doesn't seem to understand this, and returns nothing. In my opinion, this seems like a bug?

              Assignee:
              [DO NOT USE] Backlog - Query Integration
              Reporter:
              Erik Pragt
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: