[SERVER-23924] Make _id index inherit the collection's default collation Created: 25/Apr/16  Updated: 22/Jun/17  Resolved: 08/Jul/16

Status: Closed
Project: Core Server
Component/s: Querying, Replication
Affects Version/s: None
Fix Version/s: 3.3.10

Type: Task Priority: Major - P3
Reporter: J Rassi Assignee: Tess Avitabile (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-23611 Query planner should set collation fr... Closed
is depended on by SERVER-24715 Secondary crash with duplicate key er... Closed
Duplicate
is duplicated by SERVER-24271 Extend idhack to support queries with... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 16 (06/24/16), Query 17 (07/15/16)
Participants:

 Description   
  • If _id index is created at collection creation time, it should take on the collection's default collation.
  • If the _id index is not created at collection creation time, the user will not be able to specify a collation when building an index with the {_id: 1} key pattern. Such an index build must always use the collection's default collation.


 Comments   
Comment by Githook User [ 08/Jul/16 ]

Author:

{u'username': u'tessavitabile', u'name': u'Tess Avitabile', u'email': u'tess.avitabile@mongodb.com'}

Message: SERVER-23924 Make _id index inherit the collection's default collation
Branch: master
https://github.com/mongodb/mongo/commit/99d405ae814d9840c029bcb6916cc94aa03b9b68

Comment by J Rassi [ 26/Apr/16 ]

I'm not sure what you mean by "_id index to have a non-default collation." Is "non-default collation" the same thing as "not the collation on the collection"?

Ah, by "a non-default collation" I meant "a collation other than the default binary collation". I've edited my comment above to clarify.

The spec says:
The _id index is created automatically when the collection is created, and therefore will always inherit the collection’s default collation. At collection create time, there will be no way for the user to specify a collation on the _id index other than the collection default.
While this does not conflict with this ticket, it does seem to reduce the utility of the feature. Can you expound on the motivation?

Prohibiting users from being able to specify a collation on the _id index other than the collection default saves some amount of development time and presents a simpler user interface, and we didn't come up with good use cases for it.

That said, even having the _id index take on a collation other than the simple binary collation isn't going to be easy, so this ticket proposes one alternative of cutting scope where the _id index always takes on the simple binary collation, regardless of the collection default.

I think we could discuss this more efficiently in person, so I'd rather meet up to decide instead of continuing to hash this out in ticket comments.

Comment by Eric Milkie [ 26/Apr/16 ]

I'm not sure what you mean by "_id index to have a non-default collation." Is "non-default collation" the same thing as "not the collation on the collection"?
The spec says:
The _id index is created automatically when the collection is created, and therefore will always inherit the collection’s default collation. At collection create time, there will be no way for the user to specify a collation on the _id index other than the collection default.
While this does not conflict with this ticket, it does seem to reduce the utility of the feature. Can you expound on the motivation?

Comment by J Rassi [ 26/Apr/16 ]

We're currently thinking about making the _id index take on the collection default collation, and not exposing any new _id-specific collection options or allow rebuilding of the _id index. We do understand that this may require some difficult replication work, and we could consider scrapping this idea and forcing the _id to always have the default binary collation.

Let's talk in person sometime in the next week or two with other query / distributed systems folks to make a decision on this.

Comment by Scott Hernandez (Inactive) [ 26/Apr/16 ]

Would this setting be part of the collection creation options? Or applied after the _id index is built? Will it allow rebuilding the _id index with a new collation, or just setting it once?

Replication will require some extra code to handle this, as the cloner and datareplicator use the default _id index spec right now, which is not collation aware, and a static constant.

Comment by J Rassi [ 25/Apr/16 ]

This ticket is currently serving as a placeholder for figuring out whether there are any negative implications on the replication or sharding subsystems of allowing the _id index to have a non-default collation collation other than the default binary collation.

If we decide to proceed with collation support for the _id index, we may or may not want to expose a mechanism for directly setting a collation on the _id index (separate from the mechanism of setting the default collation for a collection).

Generated at Thu Feb 08 04:04:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.