[SERVER-27258] A v3.4 config server can crash with a core dump if it gets an unsupported shard key from mongo S. Created: 01/Dec/16  Updated: 05/Apr/17  Resolved: 28/Dec/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.4.2

Type: Improvement Priority: Major - P3
Reporter: Ricardo Amendoeira Assignee: Nathan Myers
Resolution: Done Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File config-backup.tar.gz    
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-01-02
Participants:

 Description   

Hello,

First of all I'm not looking for support on this, I'm not even sure if it's considered a bug but it seems like it's a bit dangerous.

I'm working on adding a new feature to mongodb and have been modifying the code for mongo S (not mongod/config server, which is the one I'm able to crash remotely).

Essentially I modified the mongo/s/shard_key_pattern.cpp to also accept "2dsphere" as a valid sharding key and then from the mongo shell called shardCollection with a 2dsphere key (index already configured). The config server promptly crashed with a core dump and the following backtrace:

2016-12-01T19:16:15.150+0000 I SHARDING [Balancer] ChunkManager loading chunks for wells_US.points sequenceNumber: 2 based on: (empty)
2016-12-01T19:16:15.151+0000 I SHARDING [Balancer] ChunkManager load took 0 ms and found version 1|0||584074beec62237f36f52c7e
2016-12-01T19:16:15.151+0000 I -        [Balancer] Invariant failure !ci.key().isEmpty() src/mongo/s/config.cpp 296
2016-12-01T19:16:15.151+0000 I -        [Balancer] 
 
***aborting after invariant() failure
 
 
2016-12-01T19:16:15.165+0000 F -        [Balancer] Got signal: 6 (Aborted).
 
 0x56062610df71 0x56062610d069 0x56062610d54d 0x7fc771ae93e0 0x7fc771744428 0x7fc77174602a 0x5606253db006 0x560625fbde17 0x560625ff44de 0x560625c55903 0x560625c57310 0x560625c47045 0x560625c4dbe9 0x7fc7722ccc80 0x7fc771adf70a 0x7fc77181582d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"560624C42000","o":"14CBF71","s":"_ZN5mongo15printStackTraceERSo"},{"b":"560624C42000","o":"14CB069"},{"b":"560624C42000","o":"14CB54D"},{"b":"7FC771AD8000","o":"113E0"},{"b":"7FC77170F000","o":"35428","s":"gsignal"},{"b":"7FC77170F000","o":"3702A","s":"abort"},{"b":"560624C42000","o":"799006","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"560624C42000","o":"137BE17","s":"_ZN5mongo8DBConfig15getChunkManagerEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb"},{"b":"560624C42000","o":"13B24DE","s":"_ZN5mongo18ScopedChunkManager11getExistingEPNS_16OperationContextERKNS_15NamespaceStringE"},{"b":"560624C42000","o":"1013903","s":"_ZN5mongo32BalancerChunkSelectionPolicyImpl32_getSplitCandidatesForCollectionEPNS_16OperationContextERKNS_15NamespaceStringERKSt6vectorINS_17ClusterStatistics15ShardStatisticsESaIS8_EE"},{"b":"560624C42000","o":"1015310","s":"_ZN5mongo32BalancerChunkSelectionPolicyImpl19selectChunksToSplitEPNS_16OperationContextE"},{"b":"560624C42000","o":"1005045","s":"_ZN5mongo8Balancer17_enforceTagRangesEPNS_16OperationContextE"},{"b":"560624C42000","o":"100BBE9","s":"_ZN5mongo8Balancer11_mainThreadEv"},{"b":"7FC772214000","o":"B8C80"},{"b":"7FC771AD8000","o":"770A"},{"b":"7FC77170F000","o":"10682D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.4.0-rc5", "gitVersion" : "7df8fe1099135d137516f1670d2a0091ace63ca0", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-47-generic", "version" : "#68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "560624C42000", "elfType" : 3, "buildId" : "150938CE464DB3BAEAD640507516D58AEAE68656" }, { "b" : "7FFE89DFD000", "elfType" : 3, "buildId" : "263FD15D8149B82F2DB14A1E6C3351C2D9F4852C" }, { "b" : "7FC77279A000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "4ABAAF20AC90D3E282D770F6A0C58A80BF16145A" }, { "b" : "7FC772596000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "09758D5225753BBF1FE3F66AADCCB6EC6F6E244B" }, { "b" : "7FC772214000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "144E588F94CAFAFDBD0BD1499C74190F678DAD88" }, { "b" : "7FC771F0B000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "9247F19167971267B6FADF4BA633290188A5483B" }, { "b" : "7FC771CF5000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7FC771AD8000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "3B58B373BD25A13045DCD3FCE540203DE330AC8E" }, { "b" : "7FC77170F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "A594A9C73A6067AB00A0F8DB78D665BE147ACDC1" }, { "b" : "7FC7729A2000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "F6DCBEE8DCAAE97C8BF7E73B56514E67118E6118" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x56062610df71]
 mongod(+0x14CB069) [0x56062610d069]
 mongod(+0x14CB54D) [0x56062610d54d]
 libpthread.so.0(+0x113E0) [0x7fc771ae93e0]
 libc.so.6(gsignal+0x38) [0x7fc771744428]
 libc.so.6(abort+0x16A) [0x7fc77174602a]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x5606253db006]
 mongod(_ZN5mongo8DBConfig15getChunkManagerEPNS_16OperationContextERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0x1267) [0x560625fbde17]
 mongod(_ZN5mongo18ScopedChunkManager11getExistingEPNS_16OperationContextERKNS_15NamespaceStringE+0xAE) [0x560625ff44de]
 mongod(_ZN5mongo32BalancerChunkSelectionPolicyImpl32_getSplitCandidatesForCollectionEPNS_16OperationContextERKNS_15NamespaceStringERKSt6vectorINS_17ClusterStatistics15ShardStatisticsESaIS8_EE+0x53) [0x560625c55903]
 mongod(_ZN5mongo32BalancerChunkSelectionPolicyImpl19selectChunksToSplitEPNS_16OperationContextE+0x270) [0x560625c57310]
 mongod(_ZN5mongo8Balancer17_enforceTagRangesEPNS_16OperationContextE+0x55) [0x560625c47045]
 mongod(_ZN5mongo8Balancer11_mainThreadEv+0x11C9) [0x560625c4dbe9]
 libstdc++.so.6(+0xB8C80) [0x7fc7722ccc80]
 libpthread.so.0(+0x770A) [0x7fc771adf70a]
 libc.so.6(clone+0x6D) [0x7fc77181582d]
-----  END BACKTRACE  -----

Like I said, I'm not sure if this is considered a problem or not.
I can give more details if this looks relevant to the stability/security of the config server.



 Comments   
Comment by Githook User [ 28/Dec/16 ]

Author:

{u'name': u'Nathan Myers', u'email': u'ncm@asperasoft.com'}

Message: SERVER-27258 Config server not crash on bad shard key
Branch: v3.4
https://github.com/mongodb/mongo/commit/906e79d74ac9c594e9ad402ac2fd0c8520874bd7

Comment by Ricardo Amendoeira [ 13/Dec/16 ]

Hi Kaloian Manassiev,

If I understood correctly you wanted a mongodump --oplog of the config server, right?
I attached it here: config-backup.tar.gz
I hope it helps

Thanks,
Ricardo Amendoeira

Comment by Kaloian Manassiev [ 12/Dec/16 ]

Thanks ric2b. We will use this ticket to add guardrails to the config server so it does not crash.

Do you mind stopping the balancer by issuing sh.stopBalancer, then sharding the collection using the command above and your custom build and then taking a dump of the config server so we can have the data to reproduce the problem with?

Thanks in advance.

-Kal.

Comment by Ricardo Amendoeira [ 09/Dec/16 ]

Ok, sorry for the delay. I just confirmed the issue, compiled mongod clean from the 3.4 tag and was still able to crash the config server by issuing

sh.shardCollection("wells_US.points", {"geometry.coordinates": "2dsphere"})

to the mongoS server.
The mongoS is modified to accept a 2dsphere sharding key, although it's probably buggy since I'm still learning the project code and can't still properly test what I'm doing.

Comment by Ricardo Amendoeira [ 02/Dec/16 ]

I've only been building mongo (shell) and mongos, yes.
I've built mongod myself as well but that was a few weeks ago, before the changes, unless I accidentally ran scons all in the meantime and didn't notice.

I'll make sure I rebuild mongod from the 3.4 tag and get back to you, but I can't do it right now.

Comment by Kaloian Manassiev [ 02/Dec/16 ]

The code, which is shown in the call stack runs both on mongos and mongod. Are you saying that you only built mongos, but not mongod (i.e., you're using 3.4.0 for these) and it still crashes, or you built all three?

If stock 3.4 crashes, then can you please dump the config database and attach it?

Comment by Ricardo Amendoeira [ 02/Dec/16 ]

Ok, thanks for the response. I just find it a bit weird that a server is able to crash another, unmodified, server. But I understand if the config server is supposed to be in a protected network and will (hopefully) only receive messages from trusted servers.

Comment by Kaloian Manassiev [ 01/Dec/16 ]

What seems to be happening here is that the contents of the shard key BSON are empty. This is something, which would have been caught at an earlier stage, while parsing the collection description in CollectionType::fromBSON, which is eventually called by DBConfig::_loadIfNeeded.

Is it possible that any of your changes caused the checks in there to be bypassed and the in-memory representation of the collection to end up with an invalid shard key?

In either case, this is not a supported scenario and is more appropriate for the mongodb-users group or Stack Overflow with the mongodb tag. Please post it there for more discussion.

Best regards,
-Kal.

Generated at Thu Feb 08 04:14:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.