[SERVER-20109] sync_passive.js Created: 25/Aug/15  Updated: 19/Sep/15  Resolved: 25/Aug/15

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 3.1.8

Type: Bug Priority: Major - P3
Reporter: Benety Goh Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

Member #2 seems to be failing to sync from #1 after #1 becomes primary (after bringing #0 down).

07728dc1e6 Linux 64-bit replicasets

https://evergreen.mongodb.com/task/mongodb_mongo_master_linux_64_replicasets_07728dc1e6721a3fe7c410fbb9ed28c316afe09b_15_08_25_05_25_11

https://logkeeper.mongodb.org/build/55dc08b8be07c47abf8d93e0/test/55dc18f4be07c47abf8dd717

[js_test:sync_passive] 2015-08-25T07:28:21.359+0000  m31001| 2015-08-25T07:28:21.359+0000 I REPL     [ReplicationExecutor] Standing for election
[js_test:sync_passive] 2015-08-25T07:28:21.359+0000  m31001| 2015-08-25T07:28:21.359+0000 I REPL     [ReplicationExecutor] running for election
[js_test:sync_passive] 2015-08-25T07:28:21.360+0000  m31002| 2015-08-25T07:28:21.359+0000 I REPL     [ReplicationExecutor] replSetElect voting yea for ip-10-170-157-105:31001 (1)
[js_test:sync_passive] 2015-08-25T07:28:21.360+0000  m31001| 2015-08-25T07:28:21.359+0000 I REPL     [ReplicationExecutor] received vote: 1 votes from ip-10-170-157-105:31002
[js_test:sync_passive] 2015-08-25T07:28:21.360+0000  m31001| 2015-08-25T07:28:21.360+0000 I REPL     [ReplicationExecutor] election succeeded, assuming primary role
[js_test:sync_passive] 2015-08-25T07:28:21.360+0000  m31001| 2015-08-25T07:28:21.360+0000 I REPL     [ReplicationExecutor] transition to PRIMARY
...
[js_test:sync_passive] 2015-08-25T07:28:54.230+0000 ReplSetTest awaitReplication: checking secondaries against timestamp Timestamp(1440487703, 1)
[js_test:sync_passive] 2015-08-25T07:28:54.231+0000 ReplSetTest awaitReplication: checking secondary #1: ip-10-170-157-105:31002
[js_test:sync_passive] 2015-08-25T07:28:54.231+0000 ReplSetTest awaitReplication: timestamp for secondary #1, ip-10-170-157-105:31002, is Timestamp(1440487676, 1001) but latest is Timestamp(1440487703, 1)
[js_test:sync_passive] 2015-08-25T07:28:54.231+0000 ReplSetTest awaitReplication: last oplog entry (of 1104) for secondary #1, ip-10-170-157-105:31002, is {  "ts" : Timestamp(1440487676, 1001),  "h" : NumberLong("5407822403662064979"),  "v" : 2,  "op" : "i",  "ns" : "test.bar",  "o" : {  "_id" : ObjectId("55dc18fc36578cb7b6f8ed43"),  "x" : 999 } }
[js_test:sync_passive] 2015-08-25T07:28:54.232+0000 ReplSetTest awaitReplication: secondary #1, ip-10-170-157-105:31002, is NOT synced
[js_test:sync_passive] 2015-08-25T07:28:54.435+0000 ReplSetTest awaitReplication: checking secondaries against timestamp Timestamp(1440487703, 1)
[js_test:sync_passive] 2015-08-25T07:28:54.435+0000 ReplSetTest awaitReplication: checking secondary #1: ip-10-170-157-105:31002
[js_test:sync_passive] 2015-08-25T07:28:54.435+0000 ReplSetTest awaitReplication: timestamp for secondary #1, ip-10-170-157-105:31002, is Timestamp(1440487676, 1001) but latest is Timestamp(1440487703, 1)
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 ReplSetTest awaitReplication: last oplog entry (of 1104) for secondary #1, ip-10-170-157-105:31002, is {  "ts" : Timestamp(1440487676, 1001),  "h" : NumberLong("5407822403662064979"),  "v" : 2,  "op" : "i",  "ns" : "test.bar",  "o" : {  "_id" : ObjectId("55dc18fc36578cb7b6f8ed43"),  "x" : 999 } }
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 ReplSetTest awaitReplication: secondary #1, ip-10-170-157-105:31002, is NOT synced
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 assert.soon failed, msg:awaiting replication
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 doassert@src/mongo/shell/assert.js:11:14
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 assert.soon@src/mongo/shell/assert.js:189:13
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 ReplSetTest.prototype.awaitReplication@src/mongo/shell/replsettest.js:524:1
[js_test:sync_passive] 2015-08-25T07:28:54.436+0000 @jstests/replsets/sync_passive.js:65:1
[js_test:sync_passive] 2015-08-25T07:28:54.437+0000 



 Comments   
Comment by Githook User [ 25/Aug/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-20109 make sync_passive.js resilient to a possible initial sync blacklisting
Branch: master
https://github.com/mongodb/mongo/commit/1e9137cc88afcbb45db5402f8f093a79e45d73bf

Comment by Matt Dannenberg [ 25/Aug/15 ]

I believe the fix is to either change the initial sync blacklist timeout, or extend the awaitReplication() call timeout in this test to compensate for this possibility.

Comment by Matt Dannenberg [ 25/Aug/15 ]

The trouble appears to be that node 2 blacklists node 1 towards the start of the test, during initial sync:

[js_test:sync_passive] 2015-08-25T07:27:54.427+0000  m31002| 2015-08-25T07:27:54.426+0000 W REPL     [rsBackgroundSync] we are too stale to use ip-10-170-157-105:31001 as a sync source

The blacklist timeout for this sort of failure is 1 minute, the test fails 62 seconds later.

Generated at Thu Feb 08 03:53:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.