[SERVER-52829] Allow multiple text indices on one collection Created: 12/Nov/20  Updated: 19/Nov/20  Resolved: 16/Nov/20

Status: Closed
Project: Core Server
Component/s: Text Search
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: NOVALUE Mitar Assignee: Edwin Zhou
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-17536 Allow multiple "text" indices on a co... Backlog
Participants:

 Description   

Currently there is a limitation of one text index on a collection. I would like to ask that this limitation is relaxed.

My use case is that I have documents which have fields which have content in different languages (and there are also fields which are not translated, like creation timestamp, etc.). I would like to have a separate text index over text content for each language.

My current workaround is to create an additional collection per language, copy into it only language specific fields, have an index there, and then when querying, first query that collection, then use $lookup to join with the main collection, and then do more queries there. But this is cumbersome and I think less efficient.



 Comments   
Comment by NOVALUE Mitar [ 17/Nov/20 ]

Thanks. This is useful.

Comment by Edwin Zhou [ 16/Nov/20 ]

Hi mitar,

Here is a workaround aggregation that will help project your current schema to a schema that would be better compatible with creating a text index for multiple languages. If this is helpful, you can store the result in a new collection and create the index on that collection. You would also need to populate langCodes and language to fulfill your usecase.

db.c.insert({  id: 0,  description: { en: "English description", de: "German description" },});
 
db.c.aggregate([
  {
    $match: { id: 0 },
  },
  {
    $project: {
      description: {
        $map: {
          input: { $objectToArray: "$description" },
          as: "d",
          in: {
            language: "$$d.k",
            content: "$$d.v",
          },
        },
      },
    },
  },
]);

which outputs:

{
  _id: ObjectId("5fb2a7eec419ead2d9b1386e"),
  description: [
    {
      language: "en",
      content: "English description",
    },
    {
      language: "de",
      content: "German description",
    }
  ]
}

Best,

Edwin

Comment by Edwin Zhou [ 16/Nov/20 ]

mitar,

Thanks for further describing and clarifying your use case. I'll go ahead and close this as duplicate of SERVER-17536. Updates will be posted on that ticket as they happen.

Best,

Edwin

 

Comment by NOVALUE Mitar [ 14/Nov/20 ]

It does look like a duplicate. I could not find it before.

I do not think the solution in documentation works for me, because I would like to keep search separate per language. Not that all words from all languages are put into the same index. Moreover, the structure of sibling language field does not align with our schema where we have fields like:

 

description:

{   en: "English description",   de: "German description" }
Comment by Edwin Zhou [ 13/Nov/20 ]

Hi david.storch, I agree that this is a duplicate on SERVER-17536, but I think the use case is covered in the MongoDB documentation.

mitar, you can create a text index for a collection in multiple languages by creating a language field and specifying a language in the documents or embedded documents. 

Please let me know if you find success with that.

Best,

Edwin

Comment by David Storch [ 13/Nov/20 ]

Hey edwin.zhou and mitar, I believe this is a duplicate of SERVER-17536?

Generated at Thu Feb 08 05:29:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.