[SERVER-22120] No data found after force sync in no_chaining.js Created: 13/Oct/15  Updated: 23/Nov/16  Resolved: 11/Jan/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.2.3, 3.3.0

Type: Bug Priority: Minor - P4
Reporter: Max Hirschhorn Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: test-only
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Repl B (10/30/15), Repl C (11/20/15), Repl E (01/08/16)
Participants:
Linked BF Score: 0

 Description   

Task
Logs

[js_test:no_chaining] 2015-10-13T14:45:11.165+0000 check that forcing sync target still works
[js_test:no_chaining] 2015-10-13T14:45:11.166+0000 { "syncFromRequested" : "ip-10-97-195-239.ec2.internal:20268", "ok" : 1 }
[js_test:no_chaining] 2015-10-13T14:45:11.892+0000 d20262| 2015-10-13T14:45:11.891+0000 I REPL     [ReplicationExecutor] syncing from: ip-10-97-195-239.ec2.internal:20268 by request
...
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 2015-10-13T14:45:19.856+0000 E QUERY    [thread1] Error: assert.soon failed, msg:Check for data after force sync :
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 doassert@src/mongo/shell/assert.js:15:14
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 assert.soon@src/mongo/shell/assert.js:194:13
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 forceSync/<@jstests/replsets/no_chaining.js:55:1
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 assert.soon@src/mongo/shell/assert.js:188:17
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 forceSync@jstests/replsets/no_chaining.js:50:1
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000 @jstests/replsets/no_chaining.js:77:5
[js_test:no_chaining] 2015-10-13T14:45:19.858+0000
[js_test:no_chaining] 2015-10-13T14:45:19.859+0000 failed to load: jstests/replsets/no_chaining.js



 Comments   
Comment by Githook User [ 14/Jan/16 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-22120 rework bad nested assert.soon()s in no_chaining.js

(cherry picked from commit aacd231be0626a204cb40908afdf62c4b67bb0ad)
Branch: v3.2
https://github.com/mongodb/mongo/commit/cc4fd27db3f93e9d518e11b94a8cbd2e0a4c0569

Comment by Githook User [ 11/Jan/16 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-22120 rework bad nested assert.soon()s in no_chaining.js
Branch: master
https://github.com/mongodb/mongo/commit/aacd231be0626a204cb40908afdf62c4b67bb0ad

Comment by Ian Whalen (Inactive) [ 30/Dec/15 ]

Appears to have happened again here? https://evergreen.mongodb.com/task/mongodb_mongo_master_rhel62_replicasets_dbd58e06914b23ed50605820515f3849baf2cbce_15_12_29_20_52_44

Comment by Charlie Swanson [ 03/Dec/15 ]

I think I just saw this in a patch build

Comment by Matt Dannenberg [ 17/Nov/15 ]

Haven't seen this in nearly a month

Comment by Matt Dannenberg [ 02/Nov/15 ]

Most recent failure October 20th: https://evergreen.mongodb.com/task/mongodb_mongo_master_linux_64_duroff_replicasets_dbbc9a2e3d8c4d7fe1748fa980ba7d01b9489dbe_15_10_20_22_27_07

Comment by Matt Dannenberg [ 15/Oct/15 ]

From the logs it is not clear what has gone wrong. We request node A (port 20262) to sync from node B(port 20268) and it appears to:

[js_test:no_chaining] 2015-10-13T14:45:11.166+0000 { "syncFromRequested" : "ip-10-97-195-239.ec2.internal:20268", "ok" : 1 }
[js_test:no_chaining] 2015-10-13T14:45:11.892+0000 d20262| 2015-10-13T14:45:11.891+0000 I REPL     [ReplicationExecutor] syncing from: ip-10-97-195-239.ec2.internal:20268 by request
[js_test:no_chaining] 2015-10-13T14:45:11.892+0000 b20268| 2015-10-13T14:45:11.892+0000 I NETWORK  [main] connection accepted from 10.97.195.239:64047 #5 (1 connection now open)
[js_test:no_chaining] 2015-10-13T14:45:11.892+0000 d20261| 2015-10-13T14:45:11.892+0000 I NETWORK  [initandlisten] connection accepted from 10.97.195.239:48731 #14 (13 connections now open)
[js_test:no_chaining] 2015-10-13T14:45:11.893+0000 d20262| 2015-10-13T14:45:11.893+0000 I REPL     [SyncSourceFeedback] setting syncSourceFeedback to ip-10-97-195-239.ec2.internal:20268 

Node B seems to immediately close the connection for no discernible reason.

[js_test:no_chaining] 2015-10-13T14:45:11.893+0000 b20268| end connection 10.97.195.239:64047

But node A seems to believe it is still syncing from node B until the end of the test, where node B closes a connection, while shutting down, causing node A's UpdatePosition command to fail:

[js_test:no_chaining] 2015-10-13T14:45:19.866+0000 b20268| end connection ip-10-97-195-239.ec2.internal:20261 (10.97.195.239)
[js_test:no_chaining] 2015-10-13T14:45:19.867+0000 d20262| 2015-10-13T14:45:19.866+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update: network error while attempting to run command 'replSetUpdatePosition' on host 'ip-10-97-195-239.ec2.internal:20268'
[js_test:no_chaining] 2015-10-13T14:45:19.867+0000 d20262| 2015-10-13T14:45:19.866+0000 I REPL     [SyncSourceFeedback] updateUpstream failed: HostUnreachable network error while attempting to run command 'replSetUpdatePosition' on host 'ip-10-97-195-239.ec2.internal:20268' , will retry

Generated at Thu Feb 08 03:59:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.