Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-25682

Relax collation locale string validation

    • Query Execution

      I started playing around with MongoDB 3.3.11's locale support, and ran into a few things that I had not expected. In most places, using the ICU locate strings in the form "language_COUNTRYCODE, would be how you would specify that. The ICU documentation at (http://userguide.icu-project.org/locale) is full of such things. Therefore, I had expected all of the following to work:

      > db.test.createIndex( { a: 1 }, { collation: { locale: 'fr_FR', caseLevel: true, strength: 4 } } );
      {
      	"ok" : 0,
      	"errmsg" : "Field 'locale' is invalid in: { locale: \"fr_FR\", caseLevel: true, strength: 4.0 }. Did you mean 'fr'?",
      	"code" : 2
      }
      > db.test.createIndex( { a: 1 }, { collation: { locale: 'fr_CA', caseLevel: true, strength: 4 } } );
      {
      	"createdCollectionAutomatically" : false,
      	"numIndexesBefore" : 1,
      	"numIndexesAfter" : 2,
      	"ok" : 1
      }
      > db.test.createIndex( { a: 1 }, { collation: { locale: 'nl_BE', caseLevel: true, strength: 4 } } );
      {
      	"ok" : 0,
      	"errmsg" : "Field 'locale' is invalid in: { locale: \"nl_BE\", caseLevel: true, strength: 4.0 }",
      	"code" : 2
      }
      > db.test.createIndex( { a: 1 }, { collation: { locale: 'nl_NL', caseLevel: true, strength: 4 } } );
      {
      	"ok" : 0,
      	"errmsg" : "Field 'locale' is invalid in: { locale: \"nl_NL\", caseLevel: true, strength: 4.0 }",
      	"code" : 2
      }
      > db.test.createIndex( { a: 1 }, { collation: { locale: 'nn_NO', caseLevel: true, strength: 4 } } );
      {
      	"ok" : 0,
      	"errmsg" : "Field 'locale' is invalid in: { locale: \"nn_NO\", caseLevel: true, strength: 4.0 }. Did you mean 'nn'?",
      	"code" : 2
      }
      > db.test.createIndex( { a: 1 }, { collation: { locale: 'nb_NO', caseLevel: true, strength: 4 } } );
      {
      	"ok" : 0,
      	"errmsg" : "Field 'locale' is invalid in: { locale: \"nb_NO\", caseLevel: true, strength: 4.0 }",
      	"code" : 2
      }
      > db.test.createIndex( { a: 1 }, { name: 'a_nl_simple', collation: { locale: 'nl' } } );
      {
      	"ok" : 0,
      	"errmsg" : "Field 'locale' is invalid in: { locale: \"nl\" }",
      	"code" : 2
      }
      

      As you can see, the only one with the language_COUNTRYCODE combination that worked, was ```fr_CA``. Sometimes it recommended me an alternative one ("fr_FR" -> "fr", "nn_NO" -> "nn"), although fr_FR and nn_NO should IMO also have been accepted.

      Additionally, is the locale "nl" not supported at all? Dutch has several interesting sorting issues revolving around "ij". (It sorts between "i" and "j"): http://demo.icu-project.org/icu-bin/locexp?_=nl

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            derick Derick Rethans
            Votes:
            0 Vote for this issue
            Watchers:
            19 Start watching this issue

              Created:
              Updated: