[SERVER-44052] Inconsistencies in sharded collections Created: 16/Oct/19 Updated: 25/Feb/20 Resolved: 25/Feb/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.2, 4.0.4, 4.0.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Stephen Paul Adithela | Assignee: | Carl Champain (Inactive) |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
Mongos is reporting inconsistent information for sharded collections. Our Setup: A DB Cluster with three shards. Each shard has PSA Architecture. Three config servers. Mongo Versions: 4.0.X Scenario: Mongos reports that collection alerts_20191026 is not sharded. But config.chunks reports that there are three chunks for that namespace that are in different shards mongos> db.alerts_20191026.getShardDistribution() Collection hitron.alerts_20191026 is not sharded. mongos> use config switched to db config mongos> db.chunks.find({ns: "hitron.alerts_20191026"}) { "_id" : "hitron.alerts_20191026-lineId_-3074457345618258602", "lastmod" : Timestamp(2, 0), "lastmodEpoch" : ObjectId("5da687c8a664d0a846cf713f"), "ns" : "hitron.alerts_20191026", "min" : \{ "lineId" : NumberLong("-3074457345618258602") }, "max" : { "lineId" : NumberLong("3074457345618258602") }, "shard" : "rs2", "history" : [ { "validAfter" : Timestamp(1571194825, 261), "shard" : "rs2" }, { "validAfter" : Timestamp(1571194824, 926), "shard" : "rs1" } ] } { "_id" : "hitron.alerts_20191026-lineId_3074457345618258602", "lastmod" : Timestamp(3, 0), "lastmodEpoch" : ObjectId("5da687c8a664d0a846cf713f"), "ns" : "hitron.alerts_20191026", "min" : \{ "lineId" : NumberLong("3074457345618258602") }, "max" : { "lineId" : { "$maxKey" : 1 } }, "shard" : "rs3", "history" : [ { "validAfter" : Timestamp(1571194825, 616), "shard" : "rs3" }, { "validAfter" : Timestamp(1571194824, 926), "shard" : "rs1" } ] } { "_id" : "hitron.alerts_20191026-lineId_MinKey", "lastmod" : Timestamp(3, 1), "lastmodEpoch" : ObjectId("5da687c8a664d0a846cf713f"), "ns" : "hitron.alerts_20191026", "min" : { "lineId" : { "$minKey" : 1 } }, "max" : { "lineId" : NumberLong("-3074457345618258602") }, "shard" : "rs1", "history" : [ { "validAfter" : Timestamp(1571194824, 926), "shard" : "rs1" } ] } mongos> db.alerts_20191026.stats().nchunks hitron.alerts_20191026
|
| Comments |
| Comment by Carl Champain (Inactive) [ 25/Feb/20 ] | ||||||||
|
Hi, We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket. Thanks, | ||||||||
| Comment by Carl Champain (Inactive) [ 03/Dec/19 ] | ||||||||
|
Hi sadithela@assia-inc.com, icruz, Thanks for the additional details on the topology of your sharded cluster.
| ||||||||
| Comment by Isaac Cruz [ 25/Nov/19 ] | ||||||||
|
Hi Carl, we have a sharded cluster, with 4 shards, each consisting on a replica set with 2 replicas + 1 arbiter. Then we have 2 separate app servers, with our java app connecting to a local mongos on each of these app servers. From these java apps we are creating everyday some collections for the future days (several collections per day), and it is here where we shard the collections. When this issue happens, it happens for all collections created at that point. One of the things (not sure if it might affect), is that we are creating these collections in parallel from app servers (different collections, but calling shardCollection at the same time on different mongos instances) | ||||||||
| Comment by Carl Champain (Inactive) [ 20/Nov/19 ] | ||||||||
|
Hi sadithela@assia-inc.com, icruz, Thanks for your patience. We think that the issue explained in We weren’t able to reproduce the issue that causes the mongod to have out of date metadata, but we'd be happy to try again. To do so, we would need a complete list of steps, plus the topology of your sharded cluster. Kind regards, | ||||||||
| Comment by Isaac Cruz [ 12/Nov/19 ] | ||||||||
|
Sorry for late reply, the response we get from java is "ok: 1", no error. | ||||||||
| Comment by Carl Champain (Inactive) [ 11/Nov/19 ] | ||||||||
|
Have you found out what the java responses are? Thanks, | ||||||||
| Comment by Stephen Paul Adithela [ 02/Nov/19 ] | ||||||||
|
Hi Carl, I had uploaded a zip file called "alerts_20191111" to your secure uploader. It should have all the mongos, mongocfg, mongodb related logs. It should also have shardVersion output. Regarding java responses, I have to follow up with my colleagues. Sorry for late reply. Thanks, Stephen | ||||||||
| Comment by Carl Champain (Inactive) [ 29/Oct/19 ] | ||||||||
|
Hi again sadithela@assia-inc.com, icruz, To help us look more into how the deployment reaches this state: 1. Can you please run getShardVersion() on shards which have chunks for the alerts_20191030 collection (or any collection affected by the described behavior) and share the output? 2. In the Java code, what response(s) are returned by the shardCollection commands? 3. Can you provide the logs from the mongos, shard primary and config servers? Ideally, please provide these all for the same improperly sharded collection. Please upload your files to our secure uploader here. Only MongoDB engineers can view these files and they will expire after a period time. Thank you, | ||||||||
| Comment by Carl Champain (Inactive) [ 28/Oct/19 ] | ||||||||
|
Hi sadithela@assia-inc.com, icruz, Very sorry for the confusion about the Java code. I re-opened the ticket for additional investigation. You mentioned that you dropped and re-sharded the collection; I want to make sure that you are aware of Back to your initial issue, we are currently investigating what the cause might be and are attempting to reproduce the described behavior. We will keep you updated and will reach out if questions come up. Kind regards, | ||||||||
| Comment by Isaac Cruz [ 24/Oct/19 ] | ||||||||
|
That difference is due to shardCollection command vs sh.shardCollection helper, which have a different syntax. Only differences between java code and shell are:
Please reopen this ticket because it is indeed a server bug, no matter what we do from a client, the DB should not end up with an inconsistent sharding configuration. | ||||||||
| Comment by Carl Champain (Inactive) [ 24/Oct/19 ] | ||||||||
|
In the Java code, you are using key as the hashed key:
This is different from the shell code in which you are using lineId as the hashed key:
It seems that the Java code should be:
That said, the SERVER project is for bugs and feature suggestions for the MongoDB server. As this ticket does not appear to be a bug, I will now close it. If you need further assistance troubleshooting, I encourage you to ask our community by posting on the mongodb-user group or on Stack Overflow with the mongodb tag.
Kind regards, | ||||||||
| Comment by Stephen Paul Adithela [ 23/Oct/19 ] | ||||||||
|
Hi Carl, From mongo cli: sh.shardCollection('hitron.alerts_20191030',{lineId: 'hashed'},false,{numInitialChunks: 3}) From java code: Attached code snippet in file: shardcolcreation.txt | ||||||||
| Comment by Carl Champain (Inactive) [ 23/Oct/19 ] | ||||||||
|
Thanks for sharing the stats. | ||||||||
| Comment by Stephen Paul Adithela [ 22/Oct/19 ] | ||||||||
|
Hi Carl, As this collection (alerts_20191026) has to be sharded for our production systems to work properly, We have dropped that collection and re-created again as a sharded collection from mongo CLI. That collection was initially created by java driver in our software. Right now, that collection's shard distribution output: Shard rs1 at rs1/hitron-db-01a:27018,hitron-db-01b:27018 Shard rs2 at rs2/hitron-db-02b:27018,hitron-db-02c:27018 Shard rs3 at rs3/hitron-db-03a:27018,hitron-db-03b:27018 Totals This is the right output we expect when we create a sharded collection. The collections being created from our software using java driver are still facing the same issue as mentioned in the description of this ticket. Here are results of a similar collection: (alerts_20191030) mongos> db.alerts_20191030.getShardDistribution() mongos> db.chunks.find({ns: "hitron.alerts_20191030"}) Also, as you have mentioned, i am attaching collstats of alerts_20191030 collection to this ticket. alerts_20191030_stats.txt Please let me know if you are in need of any other info. Thank you
Useful Info: mongo-java-driver version: 2.14.3 mongo-async-driver version: 2.0.2
| ||||||||
| Comment by Carl Champain (Inactive) [ 21/Oct/19 ] | ||||||||
|
Thank you for the report.
in mongos and share the output here? This will help me better understand what is happening. Kind regards, |