[SERVER-27189] _getLastOpTime / _getDurableOpTime do not handle missing nodes gracefully Created: 28/Nov/16  Updated: 05/Apr/17  Resolved: 30/Dec/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.3, 3.5.2

Type: Bug Priority: Major - P3
Reporter: Eric Milkie Assignee: Jack Mulrow
Resolution: Done Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-27035 rs.compareOpTimes() throws exception ... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Repl 2016-12-12, Repl 2017-01-23
Participants:

 Description   

Due to the code logic in replsettest.js, we end up with misleading messages in jstest logs:

[js_test:remove2] 2016-11-13T18:53:53.782+0000 ReplSetTest awaitReplication: starting: optime for primary, ubuntu1604-z-3.maristisv.build.10gen.cc:20262, is { "ts" : Timestamp(1479063233, 2), "t" : NumberLong(1) }
[js_test:remove2] 2016-11-13T18:53:53.782+0000 ReplSetTest awaitReplication: checking secondaries against latest primary optime { "ts" : Timestamp(1479063233, 2), "t" : NumberLong(1) }
[js_test:remove2] 2016-11-13T18:53:53.783+0000 ReplSetTest awaitReplication: checking secondary #1: ubuntu1604-z-3.maristisv.build.10gen.cc:20263
[js_test:remove2] 2016-11-13T18:53:53.783+0000 ReplSetTest awaitReplication: caught exception Error: cannot compare optimes between different protocol versions;
[js_test:remove2] 2016-11-13T18:53:53.783+0000 rs.compareOpTimes@src/mongo/shell/utils.js:1340:1
[js_test:remove2] 2016-11-13T18:53:53.783+0000 ReplSetTest/this.awaitReplication/<@src/mongo/shell/replsettest.js:788:25
[js_test:remove2] 2016-11-13T18:53:53.783+0000 _convertExceptionToReturnStatus/safeFunc@src/mongo/shell/assert.js:161:20
[js_test:remove2] 2016-11-13T18:53:53.783+0000 assert.soon@src/mongo/shell/assert.js:196:17
[js_test:remove2] 2016-11-13T18:53:53.783+0000 assert.soonNoExcept@src/mongo/shell/assert.js:215:5
[js_test:remove2] 2016-11-13T18:53:53.783+0000 ReplSetTest/this.awaitReplication@src/mongo/shell/replsettest.js:739:1
[js_test:remove2] 2016-11-13T18:53:53.783+0000 @jstests/sharding/remove2.js:114:1

(from https://logkeeper.mongodb.org/build/5ade67af34d8a92ee80eec581556fa52/test/5828b62abe07c47d2a04181b )
This is due to mistakenly assuming the response to the replSetGetStatus command will always have optimes for all configured nodes in the _getLastOpTime() and _getDurableOpTime() functions.



 Comments   
Comment by Githook User [ 06/Feb/17 ]

Author:

{u'username': u'jsmulrow', u'name': u'Jack Mulrow', u'email': u'jack.mulrow@mongodb.com'}

Message: SERVER-27189 _getLastOpTime / _getDurableOpTime do not handle missing nodes gracefully

(cherry picked from commit e5699aab7fa3b04b8ce01ee35c038dafe45f13d9)
Branch: v3.4
https://github.com/mongodb/mongo/commit/9c400024b77cf259a22c24cdc4c147ca2e53f3fc

Comment by Githook User [ 30/Dec/16 ]

Author:

{u'username': u'jsmulrow', u'name': u'Jack Mulrow', u'email': u'jack.mulrow@mongodb.com'}

Message: SERVER-27189 _getLastOpTime / _getDurableOpTime do not handle missing nodes gracefully

optime as OpTime

check for PV0
Branch: master
https://github.com/mongodb/mongo/commit/e5699aab7fa3b04b8ce01ee35c038dafe45f13d9

Comment by Judah Schvimer [ 02/Dec/16 ]

For this ticket we want to check if the OpTimes to be returned by _getLastOpTime and _getDurableOpTime are empty. If they are, we want those functions to throw exceptions rather than return empty OpTimes. This will provide clearer error messages in awaitReplication, however awaitReplication will still print stack traces, which may confuse developers reading the logs. Additionally, to reduce this confusion, we should stop printing the error stack trace here, which is pretty useless. We still want to print the exception, just not the stack trace.

Generated at Thu Feb 08 04:14:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.