[SERVER-75993] Change Streams: Document key is not fully captured Created: 12/Apr/23 Updated: 30/Jun/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Ajay Mathias | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | changestreams | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Query Execution
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | Environment used: MongoDB Atlas M30 cluster, version: 6.0.5 |
||||||||
| Participants: | |||||||||
| Description |
|
The DocumentKey field of the change event does not capture a shard key if the value of the shard key is null. Consider a collection emp with a shard key name. Inserting a document with no value for name, like Similarly, updating the name field does not have the previous value of the shard key in the document key. That is, , truncatedArrays=[]}{}}} In both these operations, the DocumentKey field should contain a field for the name as well, like documentKey={"_id": "1001", "name": null}
In our use case, we rely on the DocumentKey field to know the shard keys and their previous values. As a workaround we couldn't depend on retrieving the shard key from the config database when Change Streams is run as there is a possibility shard keys may have been refined between when the operation was performed and Change Streams was run. Expected behavior: |
| Comments |
| Comment by Bernard Gorman [ 18/May/23 ] | ||||||||||||||||||||||||||||||||||||||||||
|
Thank you for raising this issue to our attention. The behaviour you're observing is due to the semantics of null and missing, and the fact that our query language treats a query for null as matching both documents where a field is explicitly null and where that field is absent. When you first insert the document {"_id":"1001"}, the value of "name" is NOT actually null in this case, it is missing. When we started allowing shard keys to be omitted, we took the approach of treating them as-if those keys were null for data-distribution and query-routing purposes, so documents where a shard key field is missing clump together with documents where the field is null. We did this, as mentioned above, because queries for null match documents where that field is explicitly null or where it is missing, which is a special behaviour of our query language. This is why your update on {"_id":"1001", "name":null} matches this document; if you were to instead use a query that actually does match only null:
... then it would not match or update this document. But a document with an explicitly null field and one in which the field is missing are still in two distinct states, which is what the documentKey in the change stream output is indicating. We cannot simply report a non-existent shard key field as null, because then use-cases which rely upon distinguishing between shard keys that are present or absent would have no way to differentiate between these two states. As shown below, if the name field is set to explicitly null then subsequent updates will indeed show that field in the documentKey. Incidentally, the reason that the name field is still absent in the update documentKey even though you set {name: "Adam"} is because the documentKey is from before the update occurred - otherwise, there would be no way to tell which document the update applied to.
I hope this helps to clarify the behaviour you're seeing. We plan to revisit our null and missing semantics in the future, so I'll add this ticket to the set of issues for consideration as part of that effort. |