Priority: Minor - P4
Affects Version/s: 2.2.3
Fix Version/s: None
Component/s: Index Maintenance
This is similar to
SERVER-2193. But I think the use case is compelling, and the existing semantics unintuitive.
Given the following GridFS like schema:
1) Metadata collection for file metadata
2) Chunks collection for file chunks
Files are large, and we support multiple versions of files. We therefore chunk files and sha the chunks such that we can save disk space when most the file hasn't changed.
The chunks collection has a trivial unique index:
'symbol' / file name can be used for sharding (not strictly necessary)
'parents' - array of parent metadata documents representing multiple versions
'chunk' - chunk number for a given version of the file
For every version of a file, ('parent', 'chunk') must be unique.
Now this works great, it's fast, you can easily slice out ranges of the file, provides version control, and space savings when most data stays the same between versions.
However a problem arises when you try to delete. If the parents array becomes empty for more than one chunk-version, the unique constraint is violated as (null, 'chunk') can result in duplicates.
It's great that arrays work as multi-key indexes. However it's less great that the empty array is given a special 'undefined' value.
I can't see how it's useful for documents in a compound unique index, which contains a multi-key field, to be included when that multi-key field is empty. Certainly sparse could reasonably ignore empty multi-key documents in compound indexes.