|
We are having the problem of duplicate posts (orphans) being returned in the secondary nodes. I saw that this problem was resolved at https://jira.mongodb.org/browse/SERVER-5931; however, we currently have a cluster with 9 shards and 3 replica sets, all nodes using version 3.6.8 and the problem persists.
The collection has 2.2 billion documents with hashed shard key.
I have been periodically excluding orphans and, even moments after that process, I find duplicates. I performed this aggregation as a way of debugging and got the output below:
db.investigation_cards.aggregate([{
|
$match: {
|
_id: {$gt:ObjectId("5e43dab00000000000000000")} }},
|
{$group: {"_id" : "$_id" , "count" : { "$sum" : 1}}},
|
{$match: {count: {$gt: 1}}}
|
])
|
{"_id":{"$oid":"5e43f3a8ae813900169e6156"},"count":2}
|
{"_id":{"$oid":"5e43f3a8ae813900169e6155"},"count":2}
|
{"_id":{"$oid":"5e43f08b96d31b0015e73a00"},"count":2}
|
{"_id":{"$oid":"5e43e506d914da00158f1cc3"},"count":2}
|
{"_id":{"$oid":"5e43e508e1a3d00016b249d9"},"count":2}
|
{"_id":{"$oid":"5e43e5bfc048ba0015973c58"},"count":2}
|
{"_id":{"$oid":"5e43e25f5eea640015f12ea1"},"count":2}
|
{"_id":{"$oid":"5e4400a82656d10015a9397d"},"count":2}
|
{"_id":{"$oid":"5e43e5bfc048ba0015973c5a"},"count":2}
|
{"_id":{"$oid":"5e43e508e1a3d00016b249da"},"count":2}
|
{"_id":{"$oid":"5e43dbfa7e15b900156d0f9f"},"count":2}
|
{"_id":{"$oid":"5e43dbfa7e15b900156d0f9b"},"count":2}
|
{"_id":{"$oid":"5e43e5bfc048ba0015973c5b"},"count":2}
|
{"_id":{"$oid":"5e43e9f464e8b30015fc7f24"},"count":2}
|
{"_id":{"$oid":"5e43eb009c38d3103c3bcce4"},"count":2}
|
{"_id":{"$oid":"5e43e7e364e8b30015fc7906"},"count":2}
|
{"_id":{"$oid":"5e43f3a8ae813900169e6153"},"count":2}
|
{"_id":{"$oid":"5e43e508e1a3d00016b249dc"},"count":2}
|
{"_id":{"$oid":"5e43e7e364e8b30015fc790a"},"count":2}
|
{"_id":{"$oid":"5e43dbfa7e15b900156d0f9d"},"count":2}
|
If I do a search for any of these _id I get two identical documents in response.
Is there any other solution to solve this problem?
|
|
Hi Cintia, thank you for the attachments, however point 3 appears to be missing (the verbose MongoDB logs logging the aggregation).
According to the explain output, the orphaned document on EndpoingGoogleShard_6 has been correctly filtered out, and only the legitimate document on EndpoingGoogleShard_4 should have been returned. The log files are key to progress the analysis.
If the 31 March verbose logs (point 3) are still available and were only omitted from the attachments, please upload them. If the verbose logs are not available, please re-run the instructions in my previous comment starting from from point 1. Our analysis requires all 4 steps to be performed within the same period.
|
|
The requested files were sent
|
|
Hi Cintia, thank you for the attachments. The sharding metadata appears consistent, at least at the time the dump was taken. We will now focus on one shard returning one orphaned document.
- Disable the autosplitter and the balancer as before.
- Choose one document that returns a duplicate. Aggregate this document by matching on its _id with secondary read preference and local read concern. Attach the output to this ticket.
mongos> db.getMongo().setReadPref("secondary")
|
mongos> db.getSiblingDB('shipyard').investigation_cards.explain("executionStats").aggregate([{$match: {_id: ObjectId(<duplicate here>)}}], {readConcern: { level: "local" }})
|
|
- Identify the shards that have executionStats.nReturned greater than zero.
- For each of these shards, connect directly to the secondaries and run db.getSiblingDB("admin").setProfilingLevel(0,-1) to log any operation. Remember to set the profiling level back to its default (0, 100) afterwards.
- From mongos, re-run the setReadPref and aggregate commands above. Run the aggregate() without the explain() this time. Attach the output.
- Attach the mongod logs for the secondaries of each shards previously identified, covering the time both aggregates ran.
- Attach a fresh dump of the config database via the mongos.
|
|
I sent the original files. The information security team had made some changes to the files.
|
|
cintyfmoura@yahoo.com.br unfortunately restoring the dumps returns BSON and demultiplexing errors. Can you reattempt to obtain the dumps following the instructions above?
|
|
Hello I sent the files requested in the attachment.
Database: shipyard
Collection: investigation_cards
Some duplicate pids:
[{"pid":"1223203089928290304"},
|
{"pid":"1217150675483791361"},
|
{"pid":"1214120659103686656"},
|
{"pid":"1217371434126843906"},
|
{"pid":"1222825232358170625"},
|
{"pid":"1216433939746824194"},
|
{"pid":"1220050955766575106"},
|
{"pid":"1215620540188282881"},
|
{"pid":"1215002543799095297"},
|
{"pid":"1216728683651780608"},
|
{"pid":"1215732961104080897"},
|
{"pid":"1214199271085490177"},
|
{"pid":"1218215549760458752"},
|
{"pid":"1215688826083053570"},
|
{"pid":"1213622938475401221"},
|
{"pid":"1219943313266089985"},
|
{"pid":"1214606205781495809"},
|
{"pid":"1217843334846259200"},
|
{"pid":"1214969449402585089"},
|
{"pid":"1217061651633131521"},
|
{"pid":"1213692044515909632"},
|
{"pid":"1219627243699392515"},
|
{"pid":"1220032228799000577"},
|
{"pid":"1214229325291110400"},
|
{"pid":"1213883846342725632"},
|
{"pid":"1215667764813275136"},
|
{"pid":"1217143384281878529"},
|
{"pid":"1217105376384110593"},
|
{"pid":"1217087430261690369"},
|
{"pid":"1215619338293645312"}]
|
|
We are keeping the balancer and autosplit off
|
|
Thank you Cintia. We would need to analyse the query routing metadata. Can you please run the instructions below in-order, and upload the attachments our support uploader? Please note that only MongoDB engineers will be able to read the files therein and they will be automatically deleted after 180 days.
- The name of the database and the collection.
- Connect to a mongos
- Run sh.stopBalancer() and wait until sh.isBalancerRunning() returns false.
- Run sh.disableAutoSplit().
- Run the aggregation with secondary read preference and local read concern and verify that it currently returns duplicate documents.
- Share with us one "pid" value of a duplicate document returned by the aggregation above.
- Share the output of db.getSiblingDB('admin').runCommand({getShardVersion : 'MYDB.MYCOLL'}) .
Replace MYDB.MYCOLL with the actual database.collection .
- Attach a dump of the config database.
mongodump --uri="mongodb://PATH_TO_MONGOS" --gzip --archive=config.gz -d config
- For each of the 9 shards, attach a dump of the config database.
mongodump --uri="mongodb://PATH_TO_SHARD1_PRIMARY" --gzip --archive=config-shard1.gz -d config
|
|
Hi Josef.
1. My shard key is composed by a hashed index on a custom string field named "pid". This field is basically one object identification but can repeat for a thousand times, for example.
2. I see duplicate documents independent of chunk migrations, we are removing orphans periodically and they keep coming up.
3. The duplicates appear only in the secondary read preference.
|
|
Hi cintyfmoura@yahoo.com.br, thank you for the output. To progress the diagnosis, please confirm:
- Is the collection sharded on _id: "hashed", or another field?
- Do you see duplicate documents while the collection is running chunk migrations, or also at a time when no chunk migration occurs?
- Do you see duplicate documents when using primary read preference and local read concern, or are the duplicates only observed with secondary read preference?
|
|
hi Kelsey,
The aggregation:
db.investigation_cards.aggregate([{
|
$match: {
|
_id: {$gt:ObjectId("5e4a64200000000000000000")}
|
|
}},
|
{$group: {"_id" : "$_id" , "count" : { "$sum" : 1}}},
|
{$match: {count: {$gt: 1}}}
|
],{readConcern: { level: "local" }})
|
The output:
{"_id":{"$oid":"5e4a8a5f197a810015c8128a"},"count":2}
|
{"_id":{"$oid":"5e4a8a5f197a810015c81284"},"count":2}
|
{"_id":{"$oid":"5e4a86049c38d3103c0be74c"},"count":2}
|
{"_id":{"$oid":"5e4a679c46501000155776c5"},"count":2}
|
{"_id":{"$oid":"5e4a7b59b413bd00153307bf"},"count":2}
|
{"_id":{"$oid":"5e4a679c46501000155776c6"},"count":2}
|
{"_id":{"$oid":"5e4a780d89cca20015647953"},"count":2}
|
{"_id":{"$oid":"5e4a90c24d1cd8001595db3f"},"count":2}
|
{"_id":{"$oid":"5e4a8dbd46501000155895b2"},"count":2}
|
{"_id":{"$oid":"5e4a8df1465010001558978c"},"count":2}
|
{"_id":{"$oid":"5e4a90c24ae8a800153624a6"},"count":2}
|
{"_id":{"$oid":"5e4a90c24ae8a800153624a9"},"count":2}
|
{"_id":{"$oid":"5e4a90c24d1cd8001595db3e"},"count":2}
|
{"_id":{"$oid":"5e4a6b62b413bd0015329179"},"count":2}
|
{"_id":{"$oid":"5e4a691dde09dc001687ce26"},"count":2}
|
{"_id":{"$oid":"5e4a6641d3a1630015745662"},"count":2}
|
{"_id":{"$oid":"5e4a90c24d1cd8001595db3b"},"count":2}
|
{"_id":{"$oid":"5e4a90bfb6d62e00155d9c9e"},"count":2}
|
{"_id":{"$oid":"5e4a8dbd46501000155895ae"},"count":2}
|
{"_id":{"$oid":"5e4a90c24ae8a800153624a5"},"count":2}
|
{"_id":{"$oid":"5e4a68cc9c38d3103cc56341"},"count":2}
|
{"_id":{"$oid":"5e4a8dbd46501000155895b0"},"count":2}
|
{"_id":{"$oid":"5e4a8a5f197a810015c81286"},"count":2}
|
{"_id":{"$oid":"5e4a8df14650100015589789"},"count":2}
|
{"_id":{"$oid":"5e4a7a69faa3bb001592c82b"},"count":2}
|
{"_id":{"$oid":"5e4a8a5f197a810015c81282"},"count":2}
|
{"_id":{"$oid":"5e4a90c24ae8a800153624a7"},"count":2}
|
{"_id":{"$oid":"5e4a8a5f197a810015c8127f"},"count":2}
|
{"_id":{"$oid":"5e4a7124465010001557b82b"},"count":2}
|
{"_id":{"$oid":"5e4a79419c38d3103cec427a"},"count":2}
|
{"_id":{"$oid":"5e4a7d49b413bd0015331511"},"count":2}
|
{"_id":{"$oid":"5e4a8a57197a810015c81238"},"count":2}
|
{"_id":{"$oid":"5e4a90c24d1cd8001595db3a"},"count":2}
|
{"_id":{"$oid":"5e4a8012871ed70016ace226"},"count":2}
|
{"_id":{"$oid":"5e4a679c46501000155776c8"},"count":2}
|
{"_id":{"$oid":"5e4a679c46501000155776c7"},"count":2}
|
{"_id":{"$oid":"5e4a8dbd46501000155895ac"},"count":2}
|
{"_id":{"$oid":"5e4a90c24ae8a800153624a4"},"count":2}
|
{"_id":{"$oid":"5e4a90c24d1cd8001595db3c"},"count":2}
|
|
|
HI cintyfmoura@yahoo.com.br
So we can continue to investigate, would you please provide the aggregation command with readconcern local set and its output?
Thank you,
Kelsey
|
|
Using readConcern{local} it does not return the duplicate document on find but on aggregation the duplicate remains
|
|
Can you try using readConcern: local? I suspect your query is running with readConcern: available.
|
Generated at Thu Feb 08 05:10:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.