-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 3.1.9
-
Component/s: Replication
-
Fully Compatible
-
ALL
-
Repl B (10/30/15)
We enable sharding on the "test" database and shard the "test.user" collection. The secondaries then continually report that they are too stale and cannot catch up. This continues for 2 hours until the task times out.
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.553+0000 s20022| 2015-10-07T19:39:52.552+0000 I COMMAND [conn1] CMD: shardcollection: { shardCollection: "test.user", key: { x: 1.0 } } [js_test:conf_server_write_concern] 2015-10-07T19:39:52.576+0000 s20022| 2015-10-07T19:39:52.575+0000 I SHARDING [conn1] distributed lock 'test.user/ip-10-41-164-249:20022:1444246792:1804289383' acquired, ts : 56157508f1d47bdcdb443715 [js_test:conf_server_write_concern] 2015-10-07T19:39:52.576+0000 s20022| 2015-10-07T19:39:52.576+0000 I SHARDING [conn1] enable sharding on: test.user with shard key: { x: 1.0 } [js_test:conf_server_write_concern] 2015-10-07T19:39:52.578+0000 s20022| 2015-10-07T19:39:52.576+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-41-164-249-2015-10-07T19:39:52.576+0000-56157508f1d47bdcdb443716", server: "ip-10-41-164-249", clientAddr: "127.0.0.1:49835", time: new Date(1444246792576), what: "shardCollection.start", ns: "test.user", details: { shardKey: { x: 1.0 }, collection: "test.user", primary: "test-rs0:test-rs0/ip-10-41-164-249:20016,ip-10-41-164-249:20017,ip-10-41-164-249:20018", initShards: [], numChunks: 1 } } [js_test:conf_server_write_concern] 2015-10-07T19:39:52.578+0000 s20022| 2015-10-07T19:39:52.577+0000 I SHARDING [conn1] going to create 1 chunk(s) for: test.user using new epoch 56157508f1d47bdcdb443717 [js_test:conf_server_write_concern] 2015-10-07T19:39:52.584+0000 s20022| 2015-10-07T19:39:52.583+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 2 version: 1|0||56157508f1d47bdcdb443717 based on: (empty) [js_test:conf_server_write_concern] 2015-10-07T19:39:52.614+0000 s20022| 2015-10-07T19:39:52.614+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 3 version: 1|0||56157508f1d47bdcdb443717 based on: (empty) [js_test:conf_server_write_concern] 2015-10-07T19:39:52.615+0000 d20016| 2015-10-07T19:39:52.615+0000 I NETWORK [initandlisten] connection accepted from 10.41.164.249:43964 #14 (10 connections now open) [js_test:conf_server_write_concern] 2015-10-07T19:39:52.616+0000 d20016| 2015-10-07T19:39:52.616+0000 I SHARDING [conn14] remotely refreshing metadata for test.user with requested shard version 1|0||56157508f1d47bdcdb443717, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000 [js_test:conf_server_write_concern] 2015-10-07T19:39:52.617+0000 d20016| 2015-10-07T19:39:52.617+0000 I SHARDING [conn14] collection test.user was previously unsharded, new metadata loaded with shard version 1|0||56157508f1d47bdcdb443717 [js_test:conf_server_write_concern] 2015-10-07T19:39:52.617+0000 d20016| 2015-10-07T19:39:52.617+0000 I SHARDING [conn14] collection version was loaded at version 1|0||56157508f1d47bdcdb443717, took 0ms [js_test:conf_server_write_concern] 2015-10-07T19:39:52.618+0000 s20022| 2015-10-07T19:39:52.617+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-41-164-249-2015-10-07T19:39:52.617+0000-56157508f1d47bdcdb443718", server: "ip-10-41-164-249", clientAddr: "127.0.0.1:49835", time: new Date(1444246792617), what: "shardCollection", ns: "test.user", details: { version: "1|0||56157508f1d47bdcdb443717" } } ... [js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.193+0000 W REPL [rsBackgroundSync] we are too stale to use ip-10-41-164-249:20016 as a sync source [js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.193+0000 I REPL [ReplicationExecutor] could not find member to sync from [js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 E REPL [rsBackgroundSync] too stale to catch up [js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct 7 19:39:39:1) [js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct 7 19:39:52:13) [js_test:conf_server_write_concern] 2015-10-07T19:39:53.196+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember [js_test:conf_server_write_concern] 2015-10-07T19:39:53.196+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL [ReplicationExecutor] transition to RECOVERING ... [js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 W REPL [rsBackgroundSync] we are too stale to use ip-10-41-164-249:20016 as a sync source [js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL [ReplicationExecutor] could not find member to sync from [js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 E REPL [rsBackgroundSync] too stale to catch up [js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct 7 19:39:39:1) [js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct 7 19:39:52:13) [js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
- related to
-
SERVER-20814 'majority' write stuck waiting for replication after both secondaries fell off the end of the oplog
- Closed