[SERVER-19367] Segfault establishing shard connection in Chunk::splitMulti Created: 13/Jul/15  Updated: 19/Sep/15  Resolved: 03/Aug/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 3.1.7

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-19929 Audit sharding code for potential use... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 6 07/17/15, Sharding 7 08/10/15
Participants:
Linked BF Score: 0

 Description   

Observed incidentally in an Evergreen patch run (unrelated to the patch, since the test passed fine on resubmission) in geo_shardedgeonear.js. The crash is in the setup phase of the test, and is before any geo-related stuff.

 m30002| 2015-07-10T08:17:57.981+0000 I COMMAND  [conn4] command admin.$cmd command: _recvChunkCommit { _recvChunkCommit: 1 } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:258 locks:{} protocol:op_command 347ms
 m30001| 2015-07-10T08:17:57.981+0000 I SHARDING [conn3] moveChunk migrate commit accepted by TO-shard: { active: false, ns: "test.points", from: "ip-10-187-48-99:30001", min: { rand: 0.2 }, max: { rand: MaxKey }, shardKeyPattern: { rand: 1.0 }, state: "done", counts: { cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 }, ok: 1.0 }
 m30001| 2015-07-10T08:17:57.981+0000 I SHARDING [conn3] moveChunk updating self version to: 2|1||559f7fb1e2417ab851d9c307 through { rand: MinKey } -> { rand: 0.1 } for collection 'test.points'
 m30001| 2015-07-10T08:17:58.316+0000 I SHARDING [conn3] about to log metadata event: { _id: "ip-10-187-48-99-2015-07-10T08:17:58.316+0000-559f7fb64b31329a4e39d3ab", server: "ip-10-187-48-99", clientAddr: "10.187.48.99:37799", time: new Date(1436516278316), what: "moveChunk.commit", ns: "test.points", details: { min: { rand: 0.2 }, max: { rand: MaxKey }, from: "shard0001", to: "shard0002", cloned: 0, clonedBytes: 0, catchup: 0, steady: 0 } }
 m30001| 2015-07-10T08:17:58.372+0000 I SHARDING [conn3] MigrateFromStatus::done About to acquire global lock to exit critical section
 m30001| 2015-07-10T08:17:58.372+0000 I SHARDING [conn3] forking for cleanup of chunk data
 m30001| 2015-07-10T08:17:58.372+0000 I SHARDING [conn3] MigrateFromStatus::done About to acquire global lock to exit critical section
 m30001| 2015-07-10T08:17:58.373+0000 I SHARDING [RangeDeleter] Deleter starting delete for: test.points from { rand: 0.2 } -> { rand: MaxKey }, with opId: 40
 m30001| 2015-07-10T08:17:58.373+0000 I SHARDING [RangeDeleter] rangeDeleter deleted 0 documents for test.points from { rand: 0.2 } -> { rand: MaxKey }
 m30001| 2015-07-10T08:17:58.708+0000 I SHARDING [conn3] distributed lock 'test.points/ip-10-187-48-99:30001:1436516273:1905792156' unlocked.
 m30001| 2015-07-10T08:17:58.708+0000 I SHARDING [conn3] about to log metadata event: { _id: "ip-10-187-48-99-2015-07-10T08:17:58.708+0000-559f7fb64b31329a4e39d3ac", server: "ip-10-187-48-99", clientAddr: "10.187.48.99:37799", time: new Date(1436516278708), what: "moveChunk.from", ns: "test.points", details: { min: { rand: 0.2 }, max: { rand: MaxKey }, step 1 of 6: 0, step 2 of 6: 731, step 3 of 6: 5, step 4 of 6: 13, step 5 of 6: 740, step 6 of 6: 0, to: "shard0002", from: "shard0001", note: "success" } }
 m30001| 2015-07-10T08:17:58.765+0000 I COMMAND  [conn3] command test.points command: moveChunk { moveChunk: "test.points", from: "ip-10-187-48-99:30001", to: "ip-10-187-48-99:30002", fromShard: "shard0001", toShard: "shard0002", min: { rand: 0.2 }, max: { rand: MaxKey }, maxChunkSizeBytes: 52428800, configdb: "ip-10-187-48-99:29000,ip-10-187-48-99:29001,ip-10-187-48-99:29002", secondaryThrottle: true, waitForDelete: false, maxTimeMS: 0, epoch: ObjectId('559f7fb1e2417ab851d9c307') } ntoreturn:1 ntoskip:0 keyUpdates:0 writeConflicts:0 numYields:0 reslen:22 locks:{ Global: { acquireCount: { r: 9, w: 2, R: 3 } }, Database: { acquireCount: { r: 2, w: 2 } }, Collection: { acquireCount: { r: 2, W: 2 } } } protocol:op_command 1883ms
 m30999| 2015-07-10T08:17:58.767+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.points: 0ms sequenceNumber: 6 version: 2|1||559f7fb1e2417ab851d9c307 based on: 1|4||559f7fb1e2417ab851d9c307
 m30999| 2015-07-10T08:17:58.767+0000 I COMMAND  [conn1] splitting chunk [{ rand: 0.2 },{ rand: MaxKey }) in collection test.points on shard shard0002
 m30999| 2015-07-10T08:17:58.768+0000 F -        [conn1] Invalid access at address: 0xffffffffffffffe8
 m30999| 2015-07-10T08:17:58.773+0000 F -        [conn1] Got signal: 11 (Segmentation fault).
 m30999|
 m30999| ----- BEGIN BACKTRACE -----
 m30999|  mongos(mongo::printStackTrace(std::ostream&) 0x32) [0xb149b2]
 m30999|  mongos( 0x713999) [0xb13999]
 m30999|  mongos( 0x713EC8) [0xb13ec8]
 m30999|  libpthread.so.0( 0xECA0) [0x2af3b53e0ca0]
 m30999|  mongos(mongo::DBClientConnection::connectSocketOnly(mongo::HostAndPort const&) 0x1F9) [0x641b99]
 m30999|  mongos(mongo::DBClientConnection::connect(mongo::HostAndPort const&) 0x26) [0x642186]
 m30999|  mongos(mongo::DBClientConnection::connect(mongo::HostAndPort const&, std::string&) 0x20) [0x642710]
 m30999|  mongos(mongo::ConnectionString::connect(std::string&, double) const 0x3B9) [0x6322e9]
 m30999|  mongos(mongo::DBConnectionPool::get(mongo::ConnectionString const&, double) 0x76) [0x634106]
 m30999|  mongos(mongo::ScopedDbConnection::ScopedDbConnection(mongo::ConnectionString const&, double) 0x65) [0x634345]
 m30999|  mongos(mongo::Chunk::multiSplit(std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> > const&, mongo::BSONObj*) const 0xC8) [0x9bf3d8]
 m30999|  mongos( 0x62E5F3) [0xa2e5f3]
 m30999|  mongos(mongo::Command::execCommandClientBasic(mongo::OperationContext*, mongo::Command*, mongo::ClientBasic&, int, char const*, mongo::BSONObj&, mongo::BSONObjBuilder&) 0x701) [0xa5bc91]
 m30999|  mongos(mongo::Command::runAgainstRegistered(char const*, mongo::BSONObj&, mongo::BSONObjBuilder&, int) 0x2E0) [0xa5c960]
 m30999|  mongos(mongo::Strategy::clientCommandOp(mongo::Request&) 0x1C9) [0xa65859]
 m30999|  mongos(mongo::Request::process(int) 0x615) [0xa5b045]
 m30999|  mongos(mongo::ShardedMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*) 0x40) [0x5df6c0]
 m30999|  mongos(mongo::PortMessageServer::handleIncomingMsg(void*) 0x265) [0xacf185]
 m30999|  libpthread.so.0( 0x683D) [0x2af3b53d883d]
 m30999|  libc.so.6(clone 0x6D) [0x2af3b56c3fcd]
 m30999| -----  END BACKTRACE  -----
...
2015-07-10T08:17:58.786+0000 E QUERY    [main] Error: error doing query: failed
    at DB.runCommand (src/mongo/shell/db.js:124:20)
    at DB.adminCommand (src/mongo/shell/db.js:138:41)
    at test (jstests/sharding/geo_shardedgeonear.js:17:23)
    at jstests/sharding/geo_shardedgeonear.js:47:1 at src/mongo/shell/db.js:124
failed to load: jstests/sharding/geo_shardedgeonear.js



 Comments   
Comment by Githook User [ 03/Aug/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-19367 Return ConnectionString by value, not by reference
Branch: master
https://github.com/mongodb/mongo/commit/e69d00d7949e5373d0b58115e1b3583b245e06b4

Comment by Andy Schwerin [ 14/Jul/15 ]

alabid, please do some diagnostic work on this.

Generated at Thu Feb 08 03:50:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.