[SERVER-53541] Support DBRefs in wildcard indexes Created: 30/Dec/20 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | 4.4.2 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Boris Petrov | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Sprint: | Query 2021-01-25, Query Execution 2021-02-22 |
| Participants: |
| Description |
The third line blows up with:
While the fourth one works fine. However, they mean the same thing so both should work. |
| Comments |
| Comment by Boris Petrov [ 28/Feb/21 ] | ||
|
Hi Bernard, Thanks for the detailed answer and sorry for the late response! I understand all your concerns and agree with what you say. I'm not familiar with Mongo's internals at all so feel free to ignore the next idea as impossible/stupid but isn't it possible for the server to "rewrite" such a query "db.coll.find({ a: { b: 5, c: 3 } }" to "db.coll.find({ 'a.b': 5, 'a.c': 3 }" and add a filter that only/exactly these fields exist in the resulting documents - and do this only when there is a wildcard index that could run the query? This would be a subset of full-blown support for queries on a full subdocument and would help use cases like mine and is probably easier to implement. Just an idea if you haven't thought about something similar.}} Best regard, Boris | ||
| Comment by Bernard Gorman [ 24/Feb/21 ] | ||
|
Hi alien, Thank you for filing this ticket, and for bringing this issue to our attention. As Eric has already outlined, DBRef is a convention for referring to foreign documents, internally resolves to an object with a particular schema, and is not itself a specific BSON type. The server does not attempt to parse or recognise a DBRef object when it encounters one, and treats them no differently than any other subdocument; only the client-side drivers have special APIs to recognise and interpret them. As such, we are unlikely to introduce special-case handling for DBRef-format subdocuments in wildcard indexes. We have previously considered some approaches to a general solution - that is, allowing wildcard indexes to answer a query on a full subdocument, e.g. db.coll.find({a: { b: 5, c: 3 } }), which would naturally include DBRef-format subdocuments. This use-case is outside the original scope of wildcard indexes, which are intended to descend through hierarchies of subdocuments rather than indexing them. However, supporting full-subdocument queries would require either storing all potential subdocuments in the index (thereby blowing out the index size, especially for deeply-nested documents, and in turn impacting both read and write latencies) or reconstituting the original document from the individual index keys (which would be a fundamental change in how we interact with the index, requiring significant engineering work). While we do desire to return to this problem at some point, we do not consider it a high priority at present. So to summarise: DBRef is an entirely client-side convention which is not recognised by the server. Querying for complete sub-documents, such as DBRefs, goes against the original use-case for wildcard indexes, which are intended to descend through subdocuments and index the terminal field(s) along each path. While we may consider extending wildcard indexes to support this additional use-case at some point, it would not be a trivial task, and we are therefore unlikely to schedule this work in the near future - particularly when we believe there is more valuable wildcard index work to be done first, e.g. Hope this helps to clarify the issue for you! Best regards, | ||
| Comment by Boris Petrov [ 08/Jan/21 ] | ||
|
I knew that sentence would provoke clarification requests. We generate queries dynamically based on user input - that's what I mean that we don't know if a particular query would match an index. Up to now we were generating queries using DBRefs (using the Java driver). But them being unsupported in wildcard indexes makes us wonder whether to continue to use them or just convert all their usages to documents with two fields. Unfortunately that means more code and more complexity, that is why I was hoping for DB support. Thank you for looking in more detail if that would be possible! Please keep us posted as soon as there is a decision because we also want to figure our way forward. Thanks! | ||
| Comment by Eric Sedor [ 08/Jan/21 ] | ||
|
DBRefs are indeed a convention rather than a specific BSON type; they are stored in BSON as documents. For this reason, it's possible this will be a difficult exception to make for wildcard indexes, but we are going to take a look in more detail. Currently, the alternative query you mentioned db.collection.find({ 'a.b.$ref': '1', 'a.b.$id': '2' }) is needed to use the wildcard index to match individual DBRef fields (but .hint() should not be needed). Can you clarify what you mean by not knowing "which query could use such an index"? | ||
| Comment by Boris Petrov [ 08/Jan/21 ] | ||
|
Hi, Thanks for the answer. It's still a bit unclear to me whether for MongoDB a DBRef is an "atomic" value or if it is just a shorthand name for a document with a couple of special keys in it. If it's the former, then this is a bug which should be fixed. If it is the latter, then yes, you're correct that according to the documentation this should be unsupported. Note, however, that this means that we (and everyone else with a similar use-case), for example, would have to rewrite all our code that uses DBRefs to actually use "manual" expansion to a document with some fields in order to be able to use wildcard indexes (because we don't know which query could use such an index). I would prefer that not to be the case. | ||
| Comment by Eric Sedor [ 07/Jan/21 ] | ||
|
Hi alien, I'll pass this on to an appropriate team as an improvement request. Initially this seems to fall under the currently Unsupported Query Patterns as it is checking "if a field is or is not equal to a document". It's worth noting that the following two queries are not identical from a semantic perspective:
The second query matches any document where the a.b subdocument at least contains those two fields and values, while the first query only matches documents where a.b is exactly the DBRef('1', '2'). |