Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.2.0-rc0
Affects Version/s: 3.1.9
Component/s: Replication
Labels:
- disabled-test

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Repl B (10/30/15)
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

We enable sharding on the "test" database and shard the "test.user" collection. The secondaries then continually report that they are too stale and cannot catch up. This continues for 2 hours until the task times out.

[js_test:conf_server_write_concern] 2015-10-07T19:39:52.553+0000 s20022| 2015-10-07T19:39:52.552+0000 I COMMAND  [conn1] CMD: shardcollection: { shardCollection: "test.user", key: { x: 1.0 } }
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.576+0000 s20022| 2015-10-07T19:39:52.575+0000 I SHARDING [conn1] distributed lock 'test.user/ip-10-41-164-249:20022:1444246792:1804289383' acquired, ts : 56157508f1d47bdcdb443715
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.576+0000 s20022| 2015-10-07T19:39:52.576+0000 I SHARDING [conn1] enable sharding on: test.user with shard key: { x: 1.0 }
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.578+0000 s20022| 2015-10-07T19:39:52.576+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-41-164-249-2015-10-07T19:39:52.576+0000-56157508f1d47bdcdb443716", server: "ip-10-41-164-249", clientAddr: "127.0.0.1:49835", time: new Date(1444246792576), what: "shardCollection.start", ns: "test.user", details: { shardKey: { x: 1.0 }, collection: "test.user", primary: "test-rs0:test-rs0/ip-10-41-164-249:20016,ip-10-41-164-249:20017,ip-10-41-164-249:20018", initShards: [], numChunks: 1 } }
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.578+0000 s20022| 2015-10-07T19:39:52.577+0000 I SHARDING [conn1] going to create 1 chunk(s) for: test.user using new epoch 56157508f1d47bdcdb443717
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.584+0000 s20022| 2015-10-07T19:39:52.583+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 2 version: 1|0||56157508f1d47bdcdb443717 based on: (empty)
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.614+0000 s20022| 2015-10-07T19:39:52.614+0000 I SHARDING [conn1] ChunkManager: time to load chunks for test.user: 0ms sequenceNumber: 3 version: 1|0||56157508f1d47bdcdb443717 based on: (empty)
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.615+0000 d20016| 2015-10-07T19:39:52.615+0000 I NETWORK  [initandlisten] connection accepted from 10.41.164.249:43964 #14 (10 connections now open)
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.616+0000 d20016| 2015-10-07T19:39:52.616+0000 I SHARDING [conn14] remotely refreshing metadata for test.user with requested shard version 1|0||56157508f1d47bdcdb443717, current shard version is 0|0||000000000000000000000000, current metadata version is 0|0||000000000000000000000000
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.617+0000 d20016| 2015-10-07T19:39:52.617+0000 I SHARDING [conn14] collection test.user was previously unsharded, new metadata loaded with shard version 1|0||56157508f1d47bdcdb443717
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.617+0000 d20016| 2015-10-07T19:39:52.617+0000 I SHARDING [conn14] collection version was loaded at version 1|0||56157508f1d47bdcdb443717, took 0ms
[js_test:conf_server_write_concern] 2015-10-07T19:39:52.618+0000 s20022| 2015-10-07T19:39:52.617+0000 I SHARDING [conn1] about to log metadata event: { _id: "ip-10-41-164-249-2015-10-07T19:39:52.617+0000-56157508f1d47bdcdb443718", server: "ip-10-41-164-249", clientAddr: "127.0.0.1:49835", time: new Date(1444246792617), what: "shardCollection", ns: "test.user", details: { version: "1|0||56157508f1d47bdcdb443717" } }
...
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.193+0000 W REPL     [rsBackgroundSync] we are too stale to use ip-10-41-164-249:20016 as a sync source
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.193+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 E REPL     [rsBackgroundSync] too stale to catch up
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct  7 19:39:39:1)
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.195+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct  7 19:39:52:13)
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.196+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
[js_test:conf_server_write_concern] 2015-10-07T19:39:53.196+0000 d20018| 2015-10-07T19:39:53.194+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
...
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 W REPL     [rsBackgroundSync] we are too stale to use ip-10-41-164-249:20016 as a sync source
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [ReplicationExecutor] could not find member to sync from
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.015+0000 d20017| 2015-10-07T21:39:24.014+0000 E REPL     [rsBackgroundSync] too stale to catch up
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [rsBackgroundSync] our last optime : (term: 0, timestamp: Oct  7 19:39:39:1)
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [rsBackgroundSync] oldest available is (term: 1, timestamp: Oct  7 19:39:52:13)
[js_test:conf_server_write_concern] 2015-10-07T21:39:24.016+0000 d20017| 2015-10-07T21:39:24.014+0000 I REPL     [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember

related to

SERVER-20814 'majority' write stuck waiting for replication after both secondaries fell off the end of the oplog

Closed

Assignee:: Matt Dannenberg (Inactive)
Reporter:: Max Hirschhorn
Participants:: Eric Milkie, Githook User, Matt Dannenberg, Max Hirschhorn
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Oct 08 2015 03:43:03 AM UTC
Updated:: Oct 16 2015 06:13:32 PM UTC
Resolved:: Oct 08 2015 06:53:01 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates