[SERVER-3125] replset synchronisation performance problem (storing files) Created: 20/May/11  Updated: 29/Feb/12  Resolved: 16/Nov/11

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: MartinS Assignee: Kristina Chodorow (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux 2.6.38 kernel, 4GB ram, 2GB swap for each node, 3x nodex, 4th computer with application in java. connection between computers - 100mbps LAN, default oplog size (no oplogSize directive in configuration file)


Operating System: Linux
Participants:

 Description   

My java app is storing messages and files in mongo.

Secondary nodes couldn't receive data enough fast to replicate changes.

Set name: testreplset
Majority up: yes
Member id Up cctime Last heartbeat Votes Priority State Messages optime skew
172.17.0.251:27017 0 1 74 hrs 1 sec ago 1 1 RECOVERING error RS102 too stale to catch up 4dd6539b:81
172.17.0.252:27017 (me) 1 1 74 hrs 1 1 RECOVERING error RS102 too stale to catch up 4dd6546d:7c
172.17.0.253:27017 2 1 74 hrs 1 sec ago 1 1 PRIMARY 4dd65853:8

12:28:59 [startReplSets] replSet STARTUP2
12:28:59 [rs Manager] replSet can't see a majority, will not try to elect self
12:28:59 [replica set sync] replSet SECONDARY
12:28:59 [ReplSetHealthPollTask] replSet info 172.17.0.253:27017 is up
12:28:59 [ReplSetHealthPollTask] replSet member 172.17.0.253:27017 SECONDARY
12:29:00 [rs Manager] replSet info electSelf 1
12:29:00 [rs Manager] replSet couldn't elect self, only received -9999 votes
12:29:01 [conn2] replSet info voting yea for 2
12:29:01 [ReplSetHealthPollTask] replSet member 172.17.0.253:27017 PRIMARY
12:29:01 [ReplSetHealthPollTask] replSet info 172.17.0.251:27017 is up
12:29:01 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 SECONDARY
12:29:10 [replica set sync] replSet syncThread: 10278 dbclient error communicating with server: 172.17.0.253:27017
12:29:11 [ReplSetHealthPollTask] replSet info 172.17.0.253:27017 is down (or slow to respond): DBClientBase::findOne: transport error: 172.17.0.253:27017 query: { replSetHeartbeat: "testreplset", v: 1, pv: 1, checkEmpty: false, from: " 12:29:12 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
12:29:18 [conn6] replSet info voting yea for 2
12:29:18 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
12:29:19 [ReplSetHealthPollTask] replSet info 172.17.0.253:27017 is up
Fri May 20 13:45:44 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 RECOVERING
13:48:55 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
13:48:55 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c
13:48:55 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:45:52 4dd65470:19
13:48:55 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
13:48:55 [replica set sync] replSet error RS102 too stale to catch up
13:48:55 [replica set sync] replSet RECOVERING
13:49:46 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 SECONDARY
13:49:48 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 RECOVERING
13:50:55 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
13:50:55 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c
13:50:55 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:47:42 4dd654de:32
13:50:55 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
13:50:55 [replica set sync] replSet error RS102 too stale to catch up
13:52:55 [replica set sync] replSet SECONDARY
13:52:56 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
13:52:56 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c
13:52:56 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:49:51 4dd6555f:162
13:52:56 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
13:52:56 [replica set sync] replSet error RS102 too stale to catch up
13:52:56 [replica set sync] replSet RECOVERING
13:54:56 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
13:54:56 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c
13:54:56 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:51:43 4dd655cf:47
13:54:56 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
13:54:56 [replica set sync] replSet error RS102 too stale to catch up



 Comments   
Comment by Kristina Chodorow (Inactive) [ 16/Nov/11 ]

Okay, glad that's working for you.

Comment by MartinS [ 29/Oct/11 ]

better hardware isn't solution.

with 1Gbps LAN (I've recently

The best way to solve this is to set 2 limit:
oplogSize=2GB - limit oplog in normal state
and
maxOplogSize=30GB - this is used when writing data is faster than replication.

Comment by Kristina Chodorow (Inactive) [ 28/Oct/11 ]

(Yay, you're still around!)

In the other ticket (rollback, SERVER-3126), it says your primary was 998 seconds ahead (@ 4MB/sec), meaning it was ~4GB ahead. It sounds like you're just writing faster than MongoDB can replicate. You could:

  1. throttle writes using w="majority"
  2. shard
  3. get better hardware
  4. if your traffic is bursty, you could make your oplog large enough to hold a burst of traffic and then finish replicating to secondaries during down time

Those are the options I can think of off the top of my head.

Comment by MartinS [ 28/Oct/11 ]

it's about 4MBps

Comment by Kristina Chodorow (Inactive) [ 28/Oct/11 ]

How much data/sec were you inserting? (E.g., 1MB/sec.)

Comment by MartinS [ 23/May/11 ]

node1:
Fri May 20 12:21:51 [dur] lsn set 254744714
...
Fri May 20 12:24:11 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._2
...
Fri May 20 12:31:33 [conn62] end connection 172.17.0.250:42994
...
Fri May 20 13:32:39 [initandlisten] connection accepted from 172.17.0.250:43523 #65
...
Fri May 20 13:34:32 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb1testdb1/testdb1.7, filling with zeroes...
Fri May 20 13:34:32 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb1/testdb1.7, size: 2047MB, took 0.02 secs
Fri May 20 13:35:03 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._3
Fri May 20 13:35:04 [dur] DR101 latency warning on journal file open 773ms
...
Fri May 20 13:36:16 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb2/testdb2.4, filling with zeroes...
Fri May 20 13:36:16 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb2/testdb2.4, size: 1024MB, took 0.008 secs
Fri May 20 13:36:54 [dur] lsn set 259174265
Fri May 20 13:37:03 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._4
Fri May 20 13:37:03 [dur] DR101 latency warning on journal file open 297ms
...
Fri May 20 13:40:04 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._5
Fri May 20 13:40:04 [dur] DR101 latency warning on journal file open 752ms
...
Fri May 20 13:41:59 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._6
Fri May 20 13:41:59 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._7
Fri May 20 13:42:04 [dur] DR101 latency warning on journal file open 4568ms
...
Fri May 20 13:44:16 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb2/testdb2.5, filling with zeroes...
Fri May 20 13:44:16 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb2/testdb2.5, size: 2047MB, took 0.025 secs
Fri May 20 13:44:34 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb3/testdb3.6, filling with zeroes...
Fri May 20 13:44:34 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb3/testdb3.6, size: 2047MB, took 0.02 secs
...
Fri May 20 13:45:40 [replica set sync] repl: old cursor isDead, will initiate a new one
Fri May 20 13:45:44 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:45:44 [replica set sync] replSet our last optime : May 20 13:42:19 4dd6539b:81
Fri May 20 13:45:44 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:42:21 4dd6539d:5b
Fri May 20 13:45:44 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri May 20 13:45:44 [replica set sync] replSet error RS102 too stale to catch up
Fri May 20 13:45:44 [replica set sync] replSet RECOVERING
...
Fri May 20 13:47:45 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:47:45 [replica set sync] replSet our last optime : May 20 13:42:19 4dd6539b:81
Fri May 20 13:47:45 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:44:46 4dd6542e:8d
Fri May 20 13:47:45 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri May 20 13:47:45 [replica set sync] replSet error RS102 too stale to catch up
...
Fri May 20 13:48:57 [ReplSetHealthPollTask] replSet member 172.17.0.252:27017 RECOVERING
Fri May 20 13:49:45 [replica set sync] replSet SECONDARY
Fri May 20 13:49:46 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:49:46 [replica set sync] replSet our last optime : May 20 13:42:19 4dd6539b:81
Fri May 20 13:49:46 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:46:38 4dd6549e:1a
Fri May 20 13:49:46 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri May 20 13:49:46 [replica set sync] replSet error RS102 too stale to catch up
Fri May 20 13:49:46 [replica set sync] replSet RECOVERING
Fri May 20 13:49:53 [dur] lsn set 259944405
Fri May 20 13:50:51 [dur] lsn set 260003625
Fri May 20 13:51:46 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:51:46 [replica set sync] replSet our last optime : May 20 13:42:19 4dd6539b:81
Fri May 20 13:51:46 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:48:32 4dd65510:a4
Fri May 20 13:51:46 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri May 20 13:51:46 [replica set sync] replSet error RS102 too stale to catch up
...
Fri May 20 13:53:46 [replica set sync] replSet SECONDARY
Fri May 20 13:53:47 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:53:47 [replica set sync] replSet our last optime : May 20 13:42:19 4dd6539b:81
Fri May 20 13:53:47 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:50:38 4dd6558e:74
Fri May 20 13:53:47 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member

node2:
Fri May 20 12:22:59 [dur] lsn set 254769358
Fri May 20 12:23:43 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._2
Fri May 20 12:23:59 [dur] lsn set 254828508
...
Fri May 20 12:30:58 [dur] lsn set 255242208
Fri May 20 12:31:33 [conn56] end connection 172.17.0.250:38757
Fri May 20 12:31:58 [dur] lsn set 255301278
...
Fri May 20 13:30:58 [dur] lsn set 258785558
Fri May 20 13:32:39 [initandlisten] connection accepted from 172.17.0.250:59698 #59
Fri May 20 13:32:59 [dur] lsn set 258903679
Fri May 20 13:34:07 [dur] lsn set 258962809
Fri May 20 13:34:32 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb/testdb.7, filling with zeroes...
Fri May 20 13:34:32 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb/testdb.7, size: 2047MB, took 0.147 secs
Fri May 20 13:35:02 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._3
Fri May 20 13:35:03 [dur] DR101 latency warning on journal file open 325ms
Fri May 20 13:35:06 [dur] lsn set 259022019
Fri May 20 13:36:01 [dur] lsn set 259081219
Fri May 20 13:36:16 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb2/testdb2.4, filling with zeroes...
Fri May 20 13:36:16 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb2/testdb2.4, size: 1024MB, took 0.029 secs
Fri May 20 13:36:44 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._4
Fri May 20 13:37:01 [dur] lsn set 259140399
...
Fri May 20 13:39:21 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._5
Fri May 20 13:39:22 [dur] DR101 latency warning on journal file open 1179ms
Fri May 20 13:40:04 [dur] lsn set 259318019
Fri May 20 13:40:28 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._6
Fri May 20 13:40:30 [dur] DR101 latency warning on journal file open 2139ms
Fri May 20 13:41:04 [dur] lsn set 259377219
Fri May 20 13:42:05 [dur] lsn set 259436419
Fri May 20 13:42:34 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._7
Fri May 20 13:42:40 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb2/testdb2.5, filling with zeroes...
Fri May 20 13:42:40 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb2/testdb2.5, size: 2047MB, took 0.024 secs
Fri May 20 13:43:07 [dur] lsn set 259495679
Fri May 20 13:43:22 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb3/testdb3.6, filling with zeroes...
Fri May 20 13:43:24 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb3/testdb3.6, size: 2047MB, took 2.12 secs
Fri May 20 13:44:10 [dur] lsn set 259554919
Fri May 20 13:44:52 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._8
Fri May 20 13:44:53 [dur] DR101 latency warning on journal file open 1677ms
Fri May 20 13:45:04 [FileAllocator] allocating new datafile /var/lib/mongodb/testdb/testdb.8, filling with zeroes...
Fri May 20 13:45:04 [FileAllocator] done allocating datafile /var/lib/mongodb/testdb/testdb.8, size: 2047MB, took 0.104 secs
Fri May 20 13:45:09 [dur] lsn set 259614139
Fri May 20 13:45:44 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 RECOVERING
Fri May 20 13:46:09 [dur] lsn set 259673369
Fri May 20 13:46:24 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._9
Fri May 20 13:47:06 [dur] lsn set 259732619
Fri May 20 13:47:26 [dur] old journal file will be removed: /var/lib/mongodb/journal/j._10
Fri May 20 13:47:28 [dur] DR101 latency warning on journal file open 2143ms
Fri May 20 13:48:11 [dur] lsn set 259791849
Fri May 20 13:48:54 [replica set sync] repl: old cursor isDead, will initiate a new one
Fri May 20 13:48:55 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:48:55 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c
Fri May 20 13:48:55 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:45:52 4dd65470:19
Fri May 20 13:48:55 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri May 20 13:48:55 [replica set sync] replSet error RS102 too stale to catch up
Fri May 20 13:48:55 [replica set sync] replSet RECOVERING
Fri May 20 13:49:01 [dur] lsn set 259851079
Fri May 20 13:49:46 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 SECONDARY
Fri May 20 13:49:48 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 RECOVERING
Fri May 20 13:49:58 [dur] lsn set 259910349
Fri May 20 13:50:55 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
Fri May 20 13:50:55 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c
Fri May 20 13:50:55 [replica set sync] replSet oldest at 172.17.0.253:27017 : May 20 13:47:42 4dd654de:32
Fri May 20 13:50:55 [replica set sync] replSet See http://www.mongodb.org/display/DOCS/Resyncing+a+Very+Stale+Replica+Set+Member
Fri May 20 13:50:55 [replica set sync] replSet error RS102 too stale to catch up
Fri May 20 13:51:02 [dur] lsn set 259969589
Fri May 20 13:51:58 [dur] lsn set 260028829
Fri May 20 13:52:55 [replica set sync] replSet SECONDARY
Fri May 20 13:52:56 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017

node3:
Fri May 20 13:32:16 [dur] lsn set 258886326
Fri May 20 13:32:39 [initandlisten] connection accepted from 172.17.0.250:39301 #4667
Fri May 20 13:32:52 [conn4719] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "9156beebc3680af6", key: "f05d79e45ac536695b38965c18f4fbe6" }

reslen:53 162ms
Fri May 20 13:32:52 [conn4718] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 136ms
Fri May 20 13:32:52 [conn4720] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "79c80f1f192289d8", key: "b22354b3db7f53f0eef75f6176241600" }

reslen:53 158ms
Fri May 20 13:32:52 [conn4722] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "80960eade10ce835", key: "7afc048666bb34e8b34495e7ac683931" }

reslen:53 136ms
Fri May 20 13:32:52 [conn4721] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "5dcfc9d85982e507", key: "811d6fdc4bdfe1b8c693f80af88b6c06" }

reslen:53 164ms
Fri May 20 13:32:52 [initandlisten] connection accepted from 172.17.0.250:39492 #4723
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39556 #4752
Fri May 20 13:32:53 [conn4740] query testdb3.$cmd ntoreturn:1 command: { count: "fs.files", query: {} } reslen:64 140ms
Fri May 20 13:32:53 [conn4742] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "5680e8e7b9b0ad83", key: "d970f7182b3e832db62e93d1f6521fba" }

reslen:53 136ms
Fri May 20 13:32:53 [conn4744] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "e61df48abcf6cc87", key: "75894067cad8b79a3a2196c822cde8cc" }

reslen:53 122ms
Fri May 20 13:32:53 [conn4746] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "5d0befc966275ff1", key: "b38c359cea6346e25712717f42f32ded" }

reslen:53 109ms
Fri May 20 13:32:53 [conn4748] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "74ed85488bacbd59", key: "2a8b9f617aa816686bda73fa7c6e313e" }

reslen:53 105ms
Fri May 20 13:32:53 [conn4741] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "44d9ce89507a11c4", key: "0355d84782c3457e32c900134bb7ff60" }

reslen:53 143ms
Fri May 20 13:32:53 [conn4743] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "fe9947d1ec119e23", key: "dac1949db953d1dfe5de8e36d5d62d0d" }

reslen:53 132ms
Fri May 20 13:32:53 [conn4747] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "febb6271115d2dc5", key: "9c28f6059eed554588257d7b5ba00f6e" }

reslen:53 111ms
Fri May 20 13:32:53 [conn4745] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "62bfb1f5aed487cb", key: "5adbf71eabdc637cbd1ae82e7b23bdf6" }

reslen:53 119ms
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39558 #4753
...
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39591 #4768
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39594 #4769
Fri May 20 13:32:53 [conn4757] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "fa6e3dc6ab9f3e78", key: "63ec868cd12362e820ca910fe6f5a501" }

reslen:53 108ms
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39595 #4770
Fri May 20 13:32:53 [conn4758] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "dfb155cabd9ff2b9", key: "750a64cb09635ab453afb0ca4611c683" }

reslen:53 110ms
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39596 #4771
...
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39639 #4799
Fri May 20 13:32:53 [conn4779] query testdb3.$cmd ntoreturn:1 command: { count: "fs.files", query: {} } reslen:64 119ms
Fri May 20 13:32:53 [conn4778] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 110ms
Fri May 20 13:32:53 [conn4780] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "948db4468239f928", key: "312ac0375eecd3b468894ef13bba4919" }

reslen:53 111ms
Fri May 20 13:32:53 [conn4781] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "dc0dd04510d0d1b6", key: "8f59b0936285f160e92e3e010f873647" }

reslen:53 117ms
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39641 #4800
...
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39688 #4827
Fri May 20 13:32:53 [conn4804] query testdb3.$cmd ntoreturn:1 command: { count: "fs.files", query: {} } reslen:64 167ms
Fri May 20 13:32:53 [conn4802] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 161ms
Fri May 20 13:32:53 [conn4803] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 161ms
Fri May 20 13:32:53 [conn4807] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "7f66727043a7cc71", key: "ee726ba51048c766ffa105d0db4140f5" }

reslen:53 165ms
Fri May 20 13:32:53 [conn4808] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "9b5fd3a766079d69", key: "548ff19cf5b130784b1e2e0f527e0bc9" }

reslen:53 165ms
Fri May 20 13:32:53 [conn4810] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "22f3422890583eaa", key: "0ababdd06adafd92b28cfad7018c9c2d" }

reslen:53 156ms
Fri May 20 13:32:53 [conn4811] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "473c89a800b2f51a", key: "c64b7a39d2b6bc3fd9f2a65eac087903" }

reslen:53 145ms
Fri May 20 13:32:53 [conn4812] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "2216508136eecce3", key: "804ccdab9c57f0148c6a2ad31f6e111b" }

reslen:53 133ms
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39690 #4828
Fri May 20 13:32:53 [conn4805] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "fdbb98cb6e3556ed", key: "6cc9b96d827904de4c8efa26f68bff59" }

reslen:53 182ms
Fri May 20 13:32:53 [conn4806] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "c6a6eba17b913561", key: "2a3fc8b7ae04f08eda5d002864ea14bd" }

reslen:53 175ms
Fri May 20 13:32:53 [conn4809] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "9d7dd9c2118bb953", key: "9c58d4cf775f354aacc2a3b2fc96c080" }

reslen:53 172ms
Fri May 20 13:32:53 [conn4814] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "fa25bb9465811b7b", key: "ce968c393449aee910bc0cc86a06e134" }

reslen:53 102ms
Fri May 20 13:32:53 [conn4813] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "9fb768702b4f6f2f", key: "ebd53d41c6e162ae2246359ce31a1238" }

reslen:53 119ms
Fri May 20 13:32:53 [initandlisten] connection accepted from 172.17.0.250:39695 #4829
...
Fri May 20 13:32:54 [conn4830] query testdb3.$cmd ntoreturn:1 command: { count: "fs.files", query: {} } reslen:64 247ms
Fri May 20 13:32:54 [conn4831] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "26639b88a2190f7e", key: "ca4e3e851eaa7e44f44593dad84e800e" }

reslen:53 247ms
Fri May 20 13:32:54 [conn4829] query testdb3.$cmd ntoreturn:1 command: { count: "fs.files", query: {} } reslen:64 225ms
Fri May 20 13:32:54 [initandlisten] connection accepted from 172.17.0.250:39772 #4872
Fri May 20 13:32:54 [conn4833] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "672476a875513e40", key: "5774ee26dfb09f880008fb9174a88891" }

reslen:53 210ms
Fri May 20 13:32:54 [conn4832] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "b044c1f31fc53aad", key: "2c0315855b5914b770e3df43ac346e3a" }

reslen:53 209ms
Fri May 20 13:32:54 [conn4834] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "1f0429f41f116c90", key: "2b0336f95bc520415cab633e73e3cdfe" }

reslen:53 199ms
Fri May 20 13:32:54 [conn4840] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "39d8f9ae977dbd3f", key: "d2862b7fa09c50a9d1e932b4797e89bc" }

reslen:53 162ms
Fri May 20 13:32:54 [conn4841] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "a098b4bf5ba5d6ca", key: "7fb88272aec1f4c3d3fa7a3fdc7c8bb3" }

reslen:53 161ms
Fri May 20 13:32:54 [conn4842] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "e88638ea3d354cab", key: "d17758376a49f95e7e557dc1ddea2cbc" }

reslen:53 160ms
Fri May 20 13:32:54 [conn4843] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "b7546e517ed4575a", key: "cbbfbfc3a5a1651a183d5561797b767f" }

reslen:53 152ms
Fri May 20 13:32:54 [conn4844] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "c9d7620dcb5dce28", key: "5fe8086f11f8bf64d296df67f5ab745c" }

reslen:53 130ms
Fri May 20 13:32:54 [conn4845] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "d2eeeba2f288ffaa", key: "caec52ce6c8d998005564842e9be1c5f" }

reslen:53 127ms
Fri May 20 13:32:54 [conn4846] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "335e1bd82d3e738d", key: "cd6a84fd387706d2cfde62835ada3a77" }

reslen:53 126ms
Fri May 20 13:32:54 [conn4847] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "d1d975169d4a14f6", key: "d2381421cd39abaa0459de75155acbda" }

reslen:53 118ms
Fri May 20 13:32:54 [conn4848] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "9fa65c5d286c734c", key: "3c48651785028f24b5ebcf0b0937d371" }

reslen:53 112ms
Fri May 20 13:32:54 [conn4849] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "799616d557934134", key: "3477bb11006d243c172ea1c58093292e" }

reslen:53 109ms
Fri May 20 13:32:54 [conn4852] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "f17ea0185206375e", key: "81ff72cc7e2c03923d45f4914014d875" }

reslen:53 109ms
Fri May 20 13:32:54 [conn4850] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "67bd221b12fd48b9", key: "06bc602e02fa4a0e8b74eca613e66170" }

reslen:53 108ms
Fri May 20 13:32:54 [conn4837] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "ac2e0bd33be611ca", key: "a474826ea23bde1efd7a561724b172dc" }

reslen:53 178ms
Fri May 20 13:32:54 [conn4838] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "f074bd917f994b5e", key: "ca52ec6269d1b1d924a8b2f15adb8b86" }

reslen:53 177ms
Fri May 20 13:32:54 [conn4839] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "8d86715d158af479", key: "ff422af4a5737d192743b25e0348be72" }

reslen:53 172ms
Fri May 20 13:32:54 [conn4851] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "a6b8380f5739f316", key: "788e64005086a0044a6e829ff9174f37" }

reslen:53 107ms
Fri May 20 13:32:54 [conn4835] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "23b30b26a2591a6c", key: "2028f27e6ffffa6a80096178ae0fc226" }

reslen:53 206ms
Fri May 20 13:32:54 [conn4836] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "8fd7cf9fb8ad02b6", key: "7fba9f1c35cf52c3dad2e341b91946ce" }

reslen:53 194ms
Fri May 20 13:32:54 [initandlisten] connection accepted from 172.17.0.250:39776 #4873
...
Fri May 20 13:32:54 [conn4873] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 180ms
Fri May 20 13:32:54 [conn4878] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "a85c997f84cee009", key: "e1c3a62454210b018586ecab5880f88d" }

reslen:53 115ms
Fri May 20 13:32:54 [conn4877] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "f636ab4873b7e4a4", key: "95b6f31157a3846f403b69ec08b9e2e4" }

reslen:53 114ms
Fri May 20 13:32:54 [conn4880] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "77967526ae648b", key: "2cfa685ddcb6099a10286a3cbe24039c" }

reslen:53 102ms
Fri May 20 13:32:54 [initandlisten] connection accepted from 172.17.0.250:39819 #4900
Fri May 20 13:32:54 [initandlisten] connection accepted from 172.17.0.250:39820 #4901
Fri May 20 13:32:54 [conn4875] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "4033e651e24f7ddc", key: "2d27281d08d0841c4ca4863f44e0c084" }

reslen:53 158ms
Fri May 20 13:32:54 [conn4876] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "e89e553e4f985dcf", key: "18d142089eefcf907d1ea64603b2f06a" }

reslen:53 144ms
Fri May 20 13:32:54 [conn4879] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "1d929df10ec4ce4b", key: "6d7f6db0ef275f1c8cee30858d8f72b2" }

reslen:53 112ms
Fri May 20 13:32:54 [initandlisten] connection accepted from 172.17.0.250:39821 #4902
Fri May 20 13:32:54 [conn4902] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 207ms
Fri May 20 13:32:54 [conn4904] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "c383687d31b8981c", key: "29a522b7237d4dc358acbae80f89931d" }

reslen:53 188ms
Fri May 20 13:32:54 [conn4906] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "e247d27c483d9179", key: "e59237acf3f9ed6a70bcb99ede316dcb" }

reslen:53 161ms
Fri May 20 13:32:54 [conn4907] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "ea60de69e87d573c", key: "846237912c3e8f4e5dfd1cab2e8ca542" }

reslen:53 156ms
Fri May 20 13:32:54 [conn4908] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "1fd5272d84880044", key: "82f40143eee6098a993b034910d73d3e" }

reslen:53 155ms
Fri May 20 13:32:54 [conn4909] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "ee6e6273b8430dc8", key: "ace8880e602725e4113da574d8e8b541" }

reslen:53 143ms
Fri May 20 13:32:54 [conn4910] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "5c84c296377034c", key: "be1ce506dcbe837b0c2cc0c51dcdefca" }

reslen:53 138ms
Fri May 20 13:32:54 [conn4911] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "e59a410b3dfb2bfb", key: "aa9b11f70312c8fbf5d1c3ebd85bb818" }

reslen:53 130ms
Fri May 20 13:32:54 [conn4913] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "cbb3e74d174bf4b9", key: "b3f38f8a245a9e83c1a5cbf742fe88aa" }

reslen:53 123ms
Fri May 20 13:32:54 [conn4912] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "e6a4cb36a82b03e0", key: "eeeb065efddf719266e120b21f0f50ef" }

reslen:53 122ms
Fri May 20 13:32:54 [conn4914] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "6924109203235521", key: "162d31fb56c91d1d0d4a3548b3367a7a" }

reslen:53 121ms
Fri May 20 13:32:54 [conn4915] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "1ccf6687ed228b40", key: "1aeb9a6272970c4b5f9361a0e8cf7101" }

reslen:53 116ms
Fri May 20 13:32:54 [conn4916] query testdb3.$cmd ntoreturn:1 command:

{ authenticate: 1, user: "dbuser", nonce: "a55f49d5c7432a", key: "b08f7a6c5c9466d5897bab27d957b13b" }

reslen:53 106ms
Fri May 20 13:35:02 [conn4732] query testdb3.$cmd ntoreturn:1 command: { count: "fs.chunks", query: {} } reslen:64 2415ms

Comment by MartinS [ 23/May/11 ]

node1:RECOVERING> db.oplog.rs.stats()
{
"ns" : "local.oplog.rs",
"count" : 12234,
"size" : 1562337904,
"avgObjSize" : 127704.5859081249,
"storageSize" : 1562543616,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 1562543616,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}

node2:PRIMARY> db.oplog.rs.stats()
{
"ns" : "local.oplog.rs",
"count" : 138827,
"size" : 1410229740,
"avgObjSize" : 10158.180613281278,
"storageSize" : 1412488192,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 1412488192,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}

node3:ROLLBACK> db.oplog.rs.stats();
{
"ns" : "local.oplog.rs",
"count" : 60511,
"size" : 1572346276,
"avgObjSize" : 25984.470195501643,
"storageSize" : 1573554432,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 1573554432,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {

},
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}

Comment by Kristina Chodorow (Inactive) [ 20/May/11 ]

Do you have the actual db logs (not the log from the web ui) from between 12:29 and 13:48?

12:29:18 [rs Manager] replSet not trying to elect self as responded yea to someone else recently
12:29:19 [ReplSetHealthPollTask] replSet info 172.17.0.253:27017 is up
Fri May 20 13:45:44 [ReplSetHealthPollTask] replSet member 172.17.0.251:27017 RECOVERING
13:48:55 [replica set sync] replSet error RS102 too stale to catch up, at least from 172.17.0.253:27017
13:48:55 [replica set sync] replSet our last optime : May 20 13:45:49 4dd6546d:7c

Also, can you run:

> use local
> db.oplog.rs.stats()

on all three nodes and send the output?

Generated at Thu Feb 08 03:02:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.