[SERVER-20817] Timeout during conf_server_write_concern.js with legacy config servers Created: 08/Oct/15  Updated: 16/Oct/15  Resolved: 08/Oct/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.1.9
Fix Version/s: 3.2.0-rc0

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: disabled-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-20814 'majority' write stuck waiting for re... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl B (10/30/15)
Participants:

 Description   

Task
Logs

We enable sharding on the "test" database and shard the "test.user" collection. The secondaries then continually report that they are too stale and cannot catch up. This continues for 2 hours until the task times out.

[js_test:conf_server_write_concern] 2015-10-07T19:39:52.553+0000 s20022| 2015-10-07T19:39:52.552+0000 I COMMAND  [conn1] CMD: shardcollection: { shardCollection: "test.user", key: { x: 1.0 } }
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.576+0000 s20022| 2015-10-07T19:39:52.575+0000 I SHARDING [conn1] distributed lock 'test.user/ip-10-41-164-249:20022:1444246792:1804289383' acquired, ts : 56157508f1d47bdcdb443715
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.576+0000 s20022| 2015-10-07T19:39:52.576+0000 I SHARDING [conn1] enable sharding on: test.user with shard key: { x: 1.0 }
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.578+0000 s20022| 2015-10-07T19:39:52.576+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-41-164-249-2015-10-07T19:39:52.576+0000-56157508f1d47bdcdb443716", server: "ip-10-41-164-249", clientAddr: "127.0.0.1:49835", time: new Date(1444246792576), what: "shardCollection.start", ns: "test.user", details: { shardKey: { x: 1.0 }, collection: "test.user", primary: "test-rs0:test-rs0/ip-10-41-164-249:20016,ip-10-41-164-249:20017,ip-10-41-164-249:20018", initShards: [], numChunks: 1 } }
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.578+0000 s20022| 2015-10-07T19:39:52.577+0000 I SHARDING [conn1] going to create 1 chunk(s) for: test.user using new epoch 56157508f1d47bdcdb443717
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.584+0000 s20022| 2015-10-07T19:39:52.583+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 2 version: 1|0||56157508f1d47bdcdb443717 based on: (empty)
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.614+0000 s20022| 2015-10-07T19:39:52.614+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 3 version: 1|0||56157508f1d47bdcdb443717 based on: (empty)
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.615+0000 d20016| 2015-10-07T19:39:52.615+0000 I NETWORK  [initandlisten] connection accepted from 10.41.164.249:43964 #14 (10 connections now open)
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.616+0000 d20016| 2015-10-07T19:39:52.616+0000 I SHARDING [conn14] remotely refreshing metadata for test.user with requested shard version 1|0||56157508f1d47bdcdb443717, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.617+0000 d20016| 2015-10-07T19:39:52.617+0000 I SHARDING [conn14] collection test.user was previously unsharded, new metadata loaded with shard version 1|0||56157508f1d47bdcdb443717
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.617+0000 d20016| 2015-10-07T19:39:52.617+0000 I SHARDING [conn14] collection version was loaded at version 1|0||56157508f1d47bdcdb443717, took 0ms
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.618+0000 s20022| 2015-10-07T19:39:52.617+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-41-164-249-2015-10-07T19:39:52.617+0000-56157508f1d47bdcdb443718", server: "ip-10-41-164-249", clientAddr: "127.0.0.1:49835", time: new Date(1444246792617), what: "shardCollection", ns: "test.user", details: { version: "1|0||56157508f1d47bdcdb443717" } }
...
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.193+0000 W REPL     [rsBackgroundSync] we are too stale to use ip-10-41-164-249:20016 as a sync source
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.193+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 E REPL     [rsBackgroundSync] too stale to catch up
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct  7 19:39:39:1)
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct  7 19:39:52:13)
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.196+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.196+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
...
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 W REPL     [rsBackgroundSync] we are too stale to use ip-10-41-164-249:20016 as a sync source
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 E REPL     [rsBackgroundSync] too stale to catch up
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct  7 19:39:39:1)
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct  7 19:39:52:13)
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember



 Comments   
Comment by Githook User [ 08/Oct/15 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-20817 alter conf_server_write_concern.js to have a sufficiently large oplog
Branch: master
https://github.com/mongodb/mongo/commit/a74e0b83737a569f85df2e98a2b10590c3aa2d4f

Comment by Eric Milkie [ 08/Oct/15 ]

The first thing we should do is update this test (and others) to use wtimeout. This test was actually cleaned up just two days ago but we still failed to use wtimeout.

Comment by Githook User [ 08/Oct/15 ]

Author:

{u'username': u'visemet', u'name': u'Max Hirschhorn', u'email': u'max.hirschhorn@mongodb.com'}

Message: SERVER-20817 Temporarily disable conf_server_write_concern.js.
Branch: master
https://github.com/mongodb/mongo/commit/f8308fdf0cbe98c8cd5f817ae3d98c25f4191689

Comment by Max Hirschhorn [ 08/Oct/15 ]

Assigning to matt.dannenberg because it looks like it was introduced by 85fd8a5.

Generated at Thu Feb 08 03:55:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.