[SERVER-75607] Change Stream: Shard keys not captured in documentKey on delete operations performed directly on mongo instances Created: 03/Apr/23  Updated: 15/Nov/23  Resolved: 15/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ajay Mathias Assignee: Sebastien Mendez
Resolution: Cannot Reproduce Votes: 0
Labels: changestreams
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-75993 Change Streams: Document key is not f... Backlog
Assigned Teams:
Query Execution
Operating System: ALL
Steps To Reproduce:

Case 1: Connect using mongodb+srv URL
mongosh "mongodb+srv://xxxx.yyyy.mongodb.net"  

Perform delete operation on a collection with shard key "name"

ChangeStream captured has document key - documentKey={"name": "1", "_id": 1}

Both _id and shard key value are present.

Case 2: Connect to mongos
mongosh "mongodb://xxxx-shard-00-00.yyyy.mongodb.net:27016,xxxx-shard-00-01.yyyy.mongodb.net:27016,xxxx-shard-00-02.yyyy.mongodb.net:27016" --tls
Perform delete operation on a collection with shard key "name"

ChangeStream captured has document key - documentKey={"name": "6", "_id": 6}

Both _id and shard key value are present.

Case 3: Connect to mongod instances
mongosh "mongodb://xxxx-shard-00-00.yyyy.mongodb.net:27017,xxxx-shard-00-01.yyyy.mongodb.net:27017,xxxx-shard-00-02.yyyy.mongodb.net:27017" --tls
Perform delete operation on a collection with shard key "name"

ChangeStream captured has document key - documentKey={"_id": 2}

Only _id is present, although the document had a non-empty shard key value.

Sprint: QE 2023-09-18, QE 2023-10-02, QE 2023-10-16, QE 2023-10-30, QE 2023-11-13, QE 2023-11-27
Participants:

 Description   

Change Stream: Shard keys are captured in documentKey on delete operations performed when connected to mongos and mongodb+srv, but not directly to MongoDB instances.

Perform the following operations listed in the Steps To Reproduce section while reading from ChangeStream.

It can be seen that shard keys and their values are captured by the change stream when the delete operations were performed by mongosh connections that were connected to mongos or mongodb+srv (Cases 1 & 2), but not by connections that were directly connected to the MongoDB database instances (Case 3). This behavior is exhibited only for delete operations. It was noted that update change events had shard keys in their document keys in all these scenarios.



 Comments   
Comment by Sebastien Mendez [ 15/Nov/23 ]

Based on my tests and my investigation, the shardKey is never added to the documentKey when the commands are directly sent to the shards, and this is true for deletes but also for inserts and updates.

Therefore, this is consistent and will not be modified as it is crucial not to bypass mongos and send commands directly to mongod: they will not be routed to the correct shard, resulting in "orphaned" documents and document not effectively deleted, and thus introducing data inconsistencies.

Comment by Matt Panton [ 29/Sep/23 ]

ajay2589@gmail.com- Could you please add some context around connecting to a shard directly (mongosh "mongodb://xxxx-shard-00-00.yyyy.mongodb.net:27017,xxxx-shard-00-01.yyyy.mongodb.net:27017,xxxx-shard-00-02.yyyy.mongodb.net:27017) and conducting a delete?

I would like to understand is this your typical workflow or were you testing and found the behavior difference. Thanks!

Comment by Sebastien Mendez [ 28/Sep/23 ]

If you attempt to insert a document into a specific shard that should belong to another shard, MongoDB will not automatically route the document to the correct shard. Instead, it will be stored on the shard where you attempted the insertion, effectively becoming an "orphan" document.
These orphaned documents will not be visible through the mongos router, and may be deleted by a background job.
The same apply for delete, as the document from the correct shard won't be deleted neither.

To ensure proper data distribution, it's crucial to insert/update/delete documents using the mongos router, which will correctly route the command to the appropriate shard.
Note that documents that are not owned by that shard but are affected by the update/insert/delete do not generate any changestream events.
Attempting to act on documents directly into a specific shard is not a recommended practice in a sharded MongoDB cluster, as it can lead to data inconsistencies.

Could you please clarify your requirement for performing document insertion/updating/deletion within a sharded collection directly on the individual shards?

Generated at Thu Feb 08 06:30:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.