[SERVER-26622] _recvChunkCommit should set errmsg when it encounters an error Created: 13/Oct/16  Updated: 05/Apr/17  Resolved: 29/Jan/17

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: 3.4.0-rc0
Fix Version/s: 3.5.3

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Nathan Myers
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2017-01-02, Sharding 2017-02-13
Participants:

 Description   

Example of bad log message:

2016-10-11T21:48:24.272+0000 W SHARDING [conn5] Chunk move failed :: caused by :: UnknownError: commit clone failed due to UnknownError:



 Comments   
Comment by Githook User [ 28/Jan/17 ]

Author:

{u'username': u'nathan-myers-mongo', u'name': u'Nathan Myers', u'email': u'nathan.myers@10gen.com'}

Message: SERVER-26622 _recvChunkCommit to set errmsg reliably
Branch: master
https://github.com/mongodb/mongo/commit/caacf2f709ae9a01384d1de658628185259dfed2

Comment by Dianna Hohensee (Inactive) [ 12/Jan/17 ]

The UnknownError in Dan's comment looks like it was incidentally addressed by this commit.

Comment by Randolph Tan [ 07/Nov/16 ]

kaloian.manassiev After reviewing the ticket again, I realize that I misread the code in getStatusFromCommandResult and conclude that there is nothing wrong with it. The problem is that _recvChunkCommit command returns ok: false without populating the errmsg field. This appears to be the combination of MigrationDestinationManager::startCommit not setting _errmsg when it returns false and MigrationDestinationManager::report selectively setting the errmsg field on the response.

Comment by Kaloian Manassiev [ 28/Oct/16 ]

renctan, what kind of context could getStatusFromCommandResult add and from where would it take - can you elaborate in the ticket description?

Does it make sense to change this ticket to be about auditing all sharding commands to ensure that we don't have code paths which do not return an error?

Comment by Daniel Pasette (Inactive) [ 25/Oct/16 ]

Here's another one:

2016-10-24T17:44:58.996+0000 I SHARDING [Balancer] Balancer move buildlogs.logs-build_id_"2621c7d6d411859a1025e682b69be593"test_id_ObjectId('57b4e719be07c40df40105ec')seq_1 failed :: caused by :: UnknownError: can't accept new chunks because  there are still 1 deletes from previous migration

Comment by Mira Carey [ 24/Oct/16 ]

If there's some reason this should be on platforms, let us know. Otherwise I'll leave this ticket on the same backlog as the bf as it is a diagnostic change

Generated at Thu Feb 08 04:12:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.