[SERVER-29397] Invariant failure on config server when inserting tag into config.tags Created: 30/May/17 Updated: 30/Oct/23 Resolved: 28/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.4, 3.5.8 |
| Fix Version/s: | 3.4.11, 3.6.1, 3.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Clive Hill | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v3.6, v3.4
|
||||
| Steps To Reproduce: |
I have repeated the steps using 3.2 without any issues. Can someone please advise? I can provide a Java program that does all the manual steps above and gets the same error if this is helpful. |
||||
| Sprint: | Sharding 2017-12-04 | ||||
| Participants: | |||||
| Description |
|
I am writing a Java program to test various aspects of MongoDb so that we can upgrade in confidence in future. As part of this, I test aspects of sharding and shard tags. In MongoDB 3.2 (3.2.9) the sharding steps work correctly. In MongoDB 3.4.4 (also seen in 3.4.0), after inserting the following into tags collection on config database via mongos: db.tags.insertOne({ "_id" : { "ns" : "ddp.datasources", "min" : { "location" : "LN", "shard" : "LN1" }}, "ns" : "ddp.datasources", "min" : { "location" : "LN", "shard" : "LN1" }, "max" : { "location" : "LM", "shard" : "LN1" }, "tag" : "LN1" }) I get the following form the config server (I have attached mdmp):
|
| Comments |
| Comment by Githook User [ 06/Dec/17 ] | ||||||||||||||||||||||
|
Author: {'name': 'Dianna Hohensee', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com'}Message: (cherry picked from commit 1340d505df3eb777cbe1684d53c64848052b7151) | ||||||||||||||||||||||
| Comment by Githook User [ 05/Dec/17 ] | ||||||||||||||||||||||
|
Author: {'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com', 'name': 'Dianna Hohensee'}Message: (cherry picked from commit 1340d505df3eb777cbe1684d53c64848052b7151) | ||||||||||||||||||||||
| Comment by Githook User [ 28/Nov/17 ] | ||||||||||||||||||||||
|
Author: {'name': 'Dianna Hohensee', 'username': 'DiannaHohensee', 'email': 'dianna.hohensee@10gen.com'}Message: | ||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 31/May/17 ] | ||||||||||||||||||||||
|
Ah, that makes sense. I'm marking this as affecting 3.4.4 and 3.5.8, setting it to Needs Triage, and putting it on the sharding backlog. Here's a javascript repro (the key was to sleep for a while after inserting the invalid config.tags entry, to give the balancer round a chance to try to use it):
| ||||||||||||||||||||||
| Comment by Clive Hill [ 31/May/17 ] | ||||||||||||||||||||||
|
Thanks Andy! I look to make code changes to run commands from Java. FYI, in 3.2 the sharding wasn't working correctly due to max being smaller than min, but we hadn't noticed, i.e. it didn't crash. We're fixing the shard key now. | ||||||||||||||||||||||
| Comment by Andy Schwerin [ 31/May/17 ] | ||||||||||||||||||||||
|
We'll look into improving behavior when this happens; at the very least, it should not crash the server. It may be possible to use the new validation framework to prevent this kind of mistake. While the java driver cannot run the shell command sh.updateZoneKeyRange, it can use the runCommand method to directly invoke the updateZoneKeyRange command against a mongos router. I'm not an expert on the java driver, but to perform the equivalent of the following in the shell:
You need to construct a BSON document that looks as follows:
And use the java runCommand method to transmit that document as a command agains the "admin" database. | ||||||||||||||||||||||
| Comment by Clive Hill [ 31/May/17 ] | ||||||||||||||||||||||
|
I found the problem... Andy Schwerin thanks for your comments around using updateZoneKeyRange command in 3.4. This provided a helpful error message stating that the min must be less than max. This then helped me notice that I was putting the max as LM and min as LN: sh.updateZoneKeyRange("ddp.datasources", { "location" : "LN", "shard" : "LN1" }, { "location" : "LM", "shard" : "LN1" }, "LN1") This was causing the error! I changed the max to be LO and it worked fine. Do you think it would be possible to have better error message if from Java the tags collection is entered directly by e.g. doing: db.tags.insertOne({ "_id" : { "ns" : "ddp.datasources", "min" : { "location" : "LN", "shard" : "LN1" }}, "ns" : "ddp.datasources", "min" : { "location" : "LN", "shard" : "LN1" }, "max" : { "location" : "LM", "shard" : "LN1" }, "tag" : "LN1" }) ? And not collapsing with stack trace I sent? (My understanding is that it is not possible to call commands from Java, such as sh.updateZoneKeyRange , instead I have been checking what the function does and implementing directly.) | ||||||||||||||||||||||
| Comment by Clive Hill [ 31/May/17 ] | ||||||||||||||||||||||
|
FYI, what may be of interest, is that I updated from 3.2 to 3.4 by copying across the data. Everything appears to work fine if the tags already exist, and then new tags are added. | ||||||||||||||||||||||
| Comment by Clive Hill [ 31/May/17 ] | ||||||||||||||||||||||
|
1) Yep, if you run the Java program it happens every time. It also happens every time if done manually. Hopefully with the Java program you will be able to also reproduce against 3.4.4. Please add comment and I'll get back if you need any further clarification. I am running this on Windows 7 Enterprise. | ||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 30/May/17 ] | ||||||||||||||||||||||
|
Whoops, I hadn't refreshed the page - just saw you added a java repro. Thanks, I'll work with that first. | ||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 30/May/17 ] | ||||||||||||||||||||||
|
Hey EvilChill, two quick questions for you: 1) is this crash consistently reproducible via those steps, or were those the steps that caused the crash just one particular time? I wasn't able to immediately reproduce this on 3.4.4 in our regression suite, so I wanted to check if it could be a timing-related issue with the balancer. But if those steps do reproduce it consistently, maybe some detail isn't being reflected in the javascript repro script. | ||||||||||||||||||||||
| Comment by Clive Hill [ 30/May/17 ] | ||||||||||||||||||||||
|
I've attached zip (UpgradeTester.zip) with simple Java program and zip. It assumes that MongoDB 3.4 is installed, and unzips to this folder: C:\LocalFolder\temp\MongoDBUpgrade\3.4 Put a break point on line 424,. and step over. Wait. Then look at output of LN1-config.bat. After a while it will show the bug I raised. Please let me know if you cannot reproduce. I will look into managing tag zones... | ||||||||||||||||||||||
| Comment by Clive Hill [ 30/May/17 ] | ||||||||||||||||||||||
|
This is zip with simple java program which will show the issue | ||||||||||||||||||||||
| Comment by Andy Schwerin [ 30/May/17 ] | ||||||||||||||||||||||
|
We made changes to tag-aware sharding (now called zoned sharding) and the balancer during the 3.4 release. I wouldn't expect them to lead to invariant failure, so there's probably a bug somewhere, but the prescribed way to manage zones is to use the updateZoneKeyRange command. We'll work on a repro for our regression suite, but if you have a test program already, please do share it. | ||||||||||||||||||||||
| Comment by Clive Hill [ 30/May/17 ] | ||||||||||||||||||||||
|
In above where it says "Unknown macro: { "ns" }" it should read: "db.tags.insertOne({ "_id" : { "ns" : "ddp.datasources", "min" : { "location" : "LN", "shard" : "LN1" }}, "ns" : "ddp.datasources", "min" : { "location" : "LN", "shard" : "LN1" }, "max" : { "location" : "LM", "shard" : "LN1" }, "tag" : "LN1" })" Version that confirmed works was 3.2.4 (not 3.2.9) |