[SERVER-27590] Duplicate documents in multiple shards Created: 05/Jan/17  Updated: 27/Oct/23  Resolved: 06/Jan/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Dharshan Rangegowda Assignee: Unassigned
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-17013 Add 'dry run' mode for cleanupOrphaned Closed
Participants:

 Description   

Hi,

We have a sharded collection with hashed index on "_id" as the key. We started with 2 shards and added one more.

However we are finding duplicate objects with the same _id on both shard-0 and and shard-2. We identified this by directly connecting to the primary of the shards.

A few other observations
1. Running a find from the mongos does not find the duplicate documents since it probably goes to the right shard

Are these duplicates from a failed migration? If so how come mongod does not clean it up?



 Comments   
Comment by Kelsey Schubert [ 11/Jan/17 ]

Hi dharshanr@scalegrid.net,

Please take a look SERVER-17013, which provides the functionality you've described.

For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. If you have a recommendation to improve our documentation, please feel free to open a DOCS ticket describing the change here or by clicking the "report a problem link" on the lower right of any manual page.

Kind regards,
Thomas

Comment by Dharshan Rangegowda [ 06/Jan/17 ]

Hi Kal,

Does cleanupOrphaned command work for hash based sharding? The documentation does not say either way - it will be good to call it out.

Also is there an equivalent method to display orphanedDocuments before we run cleanupOrphaned command? If not I would like to request for that.

Comment by Kaloian Manassiev [ 06/Jan/17 ]

Hi dharshanr@scalegrid.net,

Like you correctly point out, these orphaned documents must have come from a failed migration (or failed cleanup). MongoS filters them out because it transmits additional information allowing shards to know what document ranges they own, which does not happen if you connect directly to the shard or if you use a secondary read preference.

... how come mongod does not clean it up?

Unfortunately currently the shards have no way of resuming a failed cleanup, which is something we are aware of. MongoDB supports the cleanupOrphaned command which can be run manually to delete these orphaned documents.

Hope this helps.

Best regards,
-Kal.

Comment by Dharshan Rangegowda [ 05/Jan/17 ]

One more observation

1. If we run an aggregation on the shard (on primary) it doesn't find the duplicates. But if we run an aggregation on the shard with readpreference Secondary it finds these duplicate documents - so this might be another issue.

Generated at Thu Feb 08 04:15:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.