[SERVER-37475] mongod backtrace if shard tag ranges are added before collection is sharded Created: 04/Oct/18 Updated: 04/Nov/18 Resolved: 12/Oct/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.0.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Timothy Olsen (Inactive) | Assignee: | Janna Golden |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Steps To Reproduce: | 1. Deploy a 2 shard sharded cluster. Each shard is a PSA replica set. |
||||||||
| Participants: | |||||||||
| Description |
|
On a 2-shard, PSA sharded cluster, if shard tag ranges are added before a collection is sharded, some shard members will backtrace. This appears to be a regression in 4.0.3-rc0. We have not seen this happen in 4.0.2. Logs attached. |
| Comments |
| Comment by Kaloian Manassiev [ 12/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
janna.golden, yes we should definitely have this check instead of crashing There must be one already for the old initial split logic which executes on the config server, right? | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Janna Golden [ 11/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
I'm going to close this ticket and am marking it as related to | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Janna Golden [ 11/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
It looks like the tag "tag1" is not associated with a shard in your cluster as is. When I restored the cluster and initially ran shardCollection, it also seg faulted. I then ran
kaloian.manassiev, we should probably add a check that the zone we are attempting to shard is associated with a particular shard and return an error to the user rather than letting this seg fault. cc tim.olsen | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 10/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
config-dump.tar.gz attached | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Timothy Olsen (Inactive) [ 10/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
It is reproducible. I am attaching a dump of the config database after we add the tags but before we shard the collection. As far as generating the tag document, we upsert a document with the keys _id, ns, min, max, and tag into the config.tags collection. | ||||||||||||||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 10/Oct/18 ] | ||||||||||||||||||||||||||||||||||||||||||
|
The stack traces are in shard0-0.log The crash occurred because of nullptr access with this call stack which is this line, but I can't see how chunks can be nullptr:
The only unusual thing I am seeing is this line of code which assigns const reference to a stack variable, but I think this is valid C++. tim.olsen - how do you generate the tags documents? I have slight suspicion that they might be missing some value which we made required. Is this reproducible and would it be possible to dump the config databases after you write the tags, but before running shardCollection so we can restore it and give it a try? Assigning to janna.golden to have a look. |