[SERVER-21524] Secondary Replication Failure Created: 18/Nov/15  Updated: 04/Jan/16  Resolved: 04/Jan/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Marc Fletcher Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File RS0-Primary-Metrics.png     PNG File RS0-Primary-MongoDiskMetrics.png     PNG File RS0-Primary-OplogLength.png     Microsoft Word RS0-Primary-OplogLength.xlsx     PNG File RS0-Secondary-Metrics.png     PNG File RS0-Secondary-MongoDiskMetrics.png     PNG File Secondary-ss-log.png     PNG File after.png     PNG File before.png     Zip Archive mongod_rs0_primary.log.zip     Zip Archive mongod_rs0_secondary.log.zip     Zip Archive server_logs_15_11_25.zip     PNG File transition.png    
Operating System: ALL
Steps To Reproduce:

NA (unfortunately)

Participants:

 Description   

We have a sharded mongo cluster with 3 shards. Each shard contains two active members and an arbitrator. Most collections are sharded across this cluster.

For the past 2 weeks the secondary in shard 0 has continuously ended up stale. Every time this has happened we have been attempting to determine the cause of the replication error, without any luck.

Using the article https://docs.mongodb.org/manual/tutorial/troubleshoot-replica-sets/ we have ruled out the below causes of replication error (evidence included in this bug report).

All evidence covers the timespan of the last secondary failure which occurred at approx. 2015/11/17 20:23 GMT.

All times are UTC.

Network Latency

See attached RS0-Primary-Metrics.png & RS0-Secondary-Metrics.png

Network throughput between the instances was lower for the duration of this period than it is during initial sync.

Disk Throughput

See attached RS0-Primary-MongoDiskMetrics.png & RS0-Secondary-MongoDiskMetrics.png

Disk throughout is limit to approx 750 total ops/s. Neither disk on the primary or secondary is reaching this limit.

Concurrency

See attached mongod_rs0_primary.log

There does not appear to be a higher than normal occurrence of slow queries or long-running operations over the period where the secondary becomes stale.

Appropriate Write Concern

Our data ingestion rate is relatively consistent and slowly varying over any 24 hours period.

The write concern used is always WriteConcern.ACKNOWLEDGED.

OpLog Size

See attached RS0-Primary-OplogLength.png & RS0-Primary-OplogLength.xlsx

The opLog length is never less than 30 minutes (equivalent to 30GB of allocated oplog space).

Configuration Setup

All instances in cluster are hosted on AWS using the m3.2xlarge instance size.

IP Addresses for the different cluster members are as follows:

Shard 0 (Contains unsharded collections)

Primary - 10.0.1.179
Secondary - 10.0.2.145
Arbiter - 10.0.1.109

Shard 1

Primary - 10.0.1.180
Secondary - 10.0.2.146
Arbiter - 10.0.1.108

Shard 2

Primary - 10.0.1.212
Secondary - 10.0.2.199
Arbiter - 10.0.1.246

The secondaries are configured out of the box, i.e. no reads are allowed from secondaries.

To date we have been unable to locate the cause of this replication issue. As such we would like to report the above as a bug in the hope that you might be able to either explain the issue or resolve the problem in an upcoming release.

For the time being when the replication fails we have just been performing an initial sync on the failed secondary.



 Comments   
Comment by Kelsey Schubert [ 04/Jan/16 ]

Hi MarcF,

Thank you for the update. I am closing this ticket since this issue has been resolved.

Kind regards,
Thomas

Comment by Marc Fletcher [ 04/Jan/16 ]

This issue now appears to be resolved. It was due to a faulty EBS volume. Many kind thanks for all of the above support!

Comment by Marc Fletcher [ 15/Dec/15 ]

To update, I am still debugging this issue with AWS. They are not quite sure what the issue is either, possibly some type of EBS degradation. We are moving our cluster over to EBS-optimised instances, in the hope this takes us a step closer to a solution (a solution other than just massively over provisioning everything).

Comment by Marc Fletcher [ 09/Dec/15 ]

I have raised a support ticket with AWS. This might be due to some faulty EBS volumes. I will update here after I have replaced the volumes.

Comment by Bruce Lucas (Inactive) [ 07/Dec/15 ]

Please clarify which logs are required?

I had in mind the mongod logs corresponding to the ss.log and iostat.log already attached to this ticket for 11-24 and 11-25.

None of the disks appear to reach their IOPS limit.

As I mentioned above, iostat shows xvdg reaching 100% utilization, with high queues and high response time and, about 6 MB/s. I'm not certain how 6 MB/s relates to the provisioned 125 IOPS, but according to iostat it sure looks like xvdg is maxed out.

/var/mongodata/journal/ - 25GB EBS volume (Provisioned IOPS Base 125)

25/11/15 11:09:42
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   12.52    0.00   87.48
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdg              0.00     0.00    0.00   58.00     0.00  6232.00   214.90   143.40 2875.86  17.24 100.00
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
 
25/11/15 11:09:43
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   18.82    0.00   81.18
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdg              0.00     1.00    0.00   51.00     0.00  6528.00   256.00   134.68 2821.80  19.61 100.00
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
 
25/11/15 11:09:44
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00   12.52    0.00   87.48
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
xvda              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00
xvdg              0.00     0.00    0.00   52.00     0.00  6656.00   256.00    85.00 2874.92  19.23 100.00
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00   0.00   0.00

Comment by Marc Fletcher [ 07/Dec/15 ]

Do you have more recent mongod logs? The ones attached to the ticket only go through 11-18, whereas the incident captured in the serverStatus and iostat data is 11-24 to 11-25.

I would be happy to collate logs after each failure and update this ticket. Please clarify which logs are required? i.e. do you also want the iostat.log and ss.log files?

Also, can you clarify which device is data and which is journal?

[ec2-user@ip-10-0-2-145 ~]$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/xvda1      7.8G  2.4G  5.3G  32% /
devtmpfs         15G   68K   15G   1% /dev
tmpfs            15G     0   15G   0% /dev/shm
/dev/xvdf       246G   36G  198G  16% /var/mongodata
/dev/xvdg        50G  752M   46G   2% /var/mongojournal

Did anything change at B related to the application or the replica set state? The mongod logs may tell us that.

We have so far been unable to find any correlation between our application load (which is slowly varying and highly correlated with CPU load over a 24 hours period) and the issues with our Mongo cluster. As above please let me know which logs I should include in updates for each instance of a secondary becoming stale.

If not maybe your disk throughput was throttled back to the provisioned iops, or something like that?

None of the disks appear to reach their IOPS limit. There are also no spikes or changes in load around the time of the secondaries becoming stale.

Anything in syslog?

The syslog's are very quiet. There are no events at all recorded in dmesg, kern or messages in the 24 hours surrounding the issues with the secondaries.

Comment by Bruce Lucas (Inactive) [ 07/Dec/15 ]

The iostat metrics support Sue's conclusion:

On the secondary at B

  • replication begins to lag (repl buffer is constantly full)
  • xvdg utilization pegs at 100%, with long queues and long i/o wait times

zooming into before B to get some average numbers:

and after B:

We see that at B

  • average replication rate actually decreases a bit, from 89 updates/s to 81 updates/s, and becomes much more bursty. (The graph for update workload in the first set of graphs above is deceptive because it shows the peaks of workload that has become bursty.)
  • disk throughput decreases from about 9 MB/s to about 6 MB/s and becomes much more constant. Both numbers I think are at or near maximum capacity for the provisioned 125 iops (at 4 KB each).

Did anything change at B related to the application or the replica set state? The mongod logs may tell us that. If not maybe your disk throughput was throttled back to the provisioned iops, or something like that? Anything in syslog?

Comment by Bruce Lucas (Inactive) [ 07/Dec/15 ]

Do you have more recent mongod logs? The ones attached to the ticket only go through 11-18, whereas the incident captured in the serverStatus and iostat data is 11-24 to 11-25.

Also, can you clarify which device is data and which is journal?

Comment by Marc Fletcher [ 07/Dec/15 ]

Thanks for the update.

A quick note from my end, it looks like the secondary in our third shard (RS2) also become stale because of a similar issue within the last 6 hours. This is the first time I've observed the behaviour outside of RS0.

Comment by Kelsey Schubert [ 07/Dec/15 ]

Hi marc@echoboxapp.com,

We have observed interesting behavior on the secondary. I've attached an image to show some of what we are seeing from the logs as we continue to investigate the issue.

Regards,
Thomas

Comment by Marc Fletcher [ 01/Dec/15 ]

I can confirm that the same secondary has just become stale when running 3.0.7. As such this issue appears to affect the most recent release as well

Comment by Marc Fletcher [ 30/Nov/15 ]

As of today we have migrated our cluster to 3.0.7. I will update within 48 hours if the issue has reoccured.

Comment by Marc Fletcher [ 25/Nov/15 ]

The requested server logs have been added to this ticket.

Comment by Marc Fletcher [ 25/Nov/15 ]

Requested logs from both primary and secondary from initial_sync to stale.

Comment by Marc Fletcher [ 24/Nov/15 ]

I have started the logging as requested on both the primary and secondary (immediately after starting the initial sync).

rs.conf output on primary:

rs0:PRIMARY> rs.conf()
{
	"_id" : "rs0",
	"version" : 109491,
	"members" : [
		{
			"_id" : 2,
			"host" : "10.0.1.179:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 2,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		},
		{
			"_id" : 3,
			"host" : "10.0.1.109:27017",
			"arbiterOnly" : true,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		},
		{
			"_id" : 4,
			"host" : "10.0.2.145:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatTimeoutSecs" : 10,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		}
	}
}

use local
db.oplog.rs.stats() on primary:

rs0:PRIMARY> db.oplog.rs.stats()
{
	"ns" : "local.oplog.rs",
	"count" : 191011,
	"size" : 32221169413,
	"avgObjSize" : 168687,
	"storageSize" : 11484241920,
	"capped" : true,
	"max" : -1,
	"maxSize" : NumberLong("32212254720"),
	"wiredTiger" : {
		"metadata" : {
			"formatVersion" : 1,
			"oplogKeyExtractionVersion" : 1
		},
		"creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1,oplogKeyExtractionVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.64078=(addr=\"01e30825c381e4d60387f9e311d6858de4d27044f0e311d6928ce43f580f6d808080e502ac837fc0e50298d4ffc0\",order=64078,time=1448392985,size=11154038784,write_gen=285060765)),checkpoint_lsn=(277717,27313024),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=9,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
		"type" : "file",
		"uri" : "statistics:table:collection-6-8821970786560075304",
		"LSM" : {
			"bloom filters in the LSM tree" : 0,
			"bloom filter false positives" : 0,
			"bloom filter hits" : 0,
			"bloom filter misses" : 0,
			"bloom filter pages evicted from cache" : 0,
			"bloom filter pages read into cache" : 0,
			"total size of bloom filters" : 0,
			"sleep for LSM checkpoint throttle" : 0,
			"chunks in the LSM tree" : 0,
			"highest merge generation in the LSM tree" : 0,
			"queries that could have benefited from a Bloom filter that did not exist" : 0,
			"sleep for LSM merge throttle" : 0
		},
		"block-manager" : {
			"file allocation unit size" : 4096,
			"blocks allocated" : 285190064,
			"checkpoint size" : 11154038784,
			"allocations requiring file extension" : 228646,
			"blocks freed" : 284814903,
			"file magic number" : 120897,
			"file major version number" : 1,
			"minor version number" : 0,
			"file bytes available for reuse" : 547069952,
			"file size in bytes" : 11484241920
		},
		"btree" : {
			"btree checkpoint generation" : 64097,
			"column-store variable-size deleted values" : 0,
			"column-store fixed-size leaf pages" : 0,
			"column-store internal pages" : 0,
			"column-store variable-size leaf pages" : 0,
			"pages rewritten by compaction" : 0,
			"number of key/value pairs" : 0,
			"fixed-record size" : 0,
			"maximum tree depth" : 5,
			"maximum internal page key size" : 368,
			"maximum internal page size" : 4096,
			"maximum leaf page key size" : 3276,
			"maximum leaf page size" : 32768,
			"maximum leaf page value size" : 67108864,
			"overflow pages" : 0,
			"row-store internal pages" : 0,
			"row-store leaf pages" : 0
		},
		"cache" : {
			"bytes read into cache" : NumberLong("52676231690627"),
			"bytes written from cache" : NumberLong("48021569438720"),
			"checkpoint blocked page eviction" : 951094,
			"unmodified pages evicted" : 250546888,
			"page split during eviction deepened the tree" : 1670,
			"modified pages evicted" : 133060699,
			"data source pages selected for eviction unable to be evicted" : 18091126,
			"hazard pointer blocked page eviction" : 13066365,
			"internal pages evicted" : 5094768,
			"pages split during eviction" : 6146468,
			"in-memory page splits" : 26331,
			"overflow values cached in memory" : 0,
			"pages read into cache" : 278410648,
			"overflow pages read into cache" : 0,
			"pages written from cache" : 285061909
		},
		"compression" : {
			"raw compression call failed, no additional data available" : 0,
			"raw compression call failed, additional data available" : 0,
			"raw compression call succeeded" : 0,
			"compressed pages read" : 275287187,
			"compressed pages written" : 254778360,
			"page written failed to compress" : 27,
			"page written was too small to compress" : 30282820
		},
		"cursor" : {
			"create calls" : 6236990,
			"insert calls" : 340733823,
			"bulk-loaded cursor-insert calls" : 0,
			"cursor-insert key and value bytes inserted" : NumberLong("46598898226304"),
			"next calls" : 1052359777,
			"prev calls" : 184599,
			"remove calls" : 6234220,
			"cursor-remove key bytes removed" : 56107980,
			"reset calls" : 836289474,
			"search calls" : 454486962,
			"search near calls" : 8196095,
			"update calls" : 0,
			"cursor-update value bytes updated" : 0
		},
		"reconciliation" : {
			"dictionary matches" : 0,
			"internal page multi-block writes" : 115807,
			"leaf page multi-block writes" : 6176024,
			"maximum blocks required for a page" : 301,
			"internal-page overflow keys" : 0,
			"leaf-page overflow keys" : 0,
			"overflow values written" : 0,
			"pages deleted" : 24945282,
			"page checksum matches" : 2842761,
			"page reconciliation calls" : 134996009,
			"page reconciliation calls for eviction" : 132445970,
			"leaf page key bytes discarded using prefix compression" : 0,
			"internal page key bytes discarded using suffix compression" : 264587122
		},
		"session" : {
			"object compaction" : 0,
			"open cursor count" : 6236990
		},
		"transaction" : {
			"update conflicts" : 0
		}
	},
	"nindexes" : 0,
	"totalIndexSize" : 0,
	"indexSizes" : {
		
	},
	"ok" : 1,
	"$gleStats" : {
		"lastOpTime" : Timestamp(0, 0),
		"electionId" : ObjectId("5614f9ebaffca08a80d27834")
	}
}

rs.conf() output on secondary:

rs0:STARTUP2> rs.conf()
{
	"_id" : "rs0",
	"version" : 109491,
	"members" : [
		{
			"_id" : 2,
			"host" : "10.0.1.179:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 2,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		},
		{
			"_id" : 3,
			"host" : "10.0.1.109:27017",
			"arbiterOnly" : true,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		},
		{
			"_id" : 4,
			"host" : "10.0.2.145:27017",
			"arbiterOnly" : false,
			"buildIndexes" : true,
			"hidden" : false,
			"priority" : 1,
			"tags" : {
				
			},
			"slaveDelay" : 0,
			"votes" : 1
		}
	],
	"settings" : {
		"chainingAllowed" : true,
		"heartbeatTimeoutSecs" : 10,
		"getLastErrorModes" : {
			
		},
		"getLastErrorDefaults" : {
			"w" : 1,
			"wtimeout" : 0
		}
	}
}

use local
db.oplog.rs.stats() output on secondary:

rs0:STARTUP2> db.oplog.rs.stats()
{
	"ns" : "local.oplog.rs",
	"count" : 0,
	"size" : 0,
	"storageSize" : 4096,
	"capped" : true,
	"max" : -1,
	"maxSize" : NumberLong("32212254720"),
	"wiredTiger" : {
		"metadata" : {
			"formatVersion" : 1,
			"oplogKeyExtractionVersion" : 1
		},
		"creationString" : "allocation_size=4KB,app_metadata=(formatVersion=1,oplogKeyExtractionVersion=1),block_allocation=best,block_compressor=snappy,cache_resident=0,checkpoint=(WiredTigerCheckpoint.1=(addr=\"\",order=1,time=1448392769,size=0,write_gen=0)),checkpoint_lsn=(1,0),checksum=on,collator=,columns=,dictionary=0,format=btree,huffman_key=,huffman_value=,id=9,internal_item_max=0,internal_key_max=0,internal_key_truncate=,internal_page_max=4KB,key_format=q,key_gap=10,leaf_item_max=0,leaf_key_max=0,leaf_page_max=32KB,leaf_value_max=64MB,memory_page_max=10m,os_cache_dirty_max=0,os_cache_max=0,prefix_compression=0,prefix_compression_min=4,split_deepen_min_child=0,split_deepen_per_child=0,split_pct=90,value_format=u,version=(major=1,minor=1)",
		"type" : "file",
		"uri" : "statistics:table:collection-6--7901692086452264761",
		"LSM" : {
			"bloom filters in the LSM tree" : 0,
			"bloom filter false positives" : 0,
			"bloom filter hits" : 0,
			"bloom filter misses" : 0,
			"bloom filter pages evicted from cache" : 0,
			"bloom filter pages read into cache" : 0,
			"total size of bloom filters" : 0,
			"sleep for LSM checkpoint throttle" : 0,
			"chunks in the LSM tree" : 0,
			"highest merge generation in the LSM tree" : 0,
			"queries that could have benefited from a Bloom filter that did not exist" : 0,
			"sleep for LSM merge throttle" : 0
		},
		"block-manager" : {
			"file allocation unit size" : 4096,
			"blocks allocated" : 0,
			"checkpoint size" : 0,
			"allocations requiring file extension" : 0,
			"blocks freed" : 0,
			"file magic number" : 120897,
			"file major version number" : 1,
			"minor version number" : 0,
			"file bytes available for reuse" : 0,
			"file size in bytes" : 4096
		},
		"btree" : {
			"btree checkpoint generation" : 5,
			"column-store variable-size deleted values" : 0,
			"column-store fixed-size leaf pages" : 0,
			"column-store internal pages" : 0,
			"column-store variable-size leaf pages" : 0,
			"pages rewritten by compaction" : 0,
			"number of key/value pairs" : 0,
			"fixed-record size" : 0,
			"maximum tree depth" : 0,
			"maximum internal page key size" : 368,
			"maximum internal page size" : 4096,
			"maximum leaf page key size" : 3276,
			"maximum leaf page size" : 32768,
			"maximum leaf page value size" : 67108864,
			"overflow pages" : 0,
			"row-store internal pages" : 0,
			"row-store leaf pages" : 0
		},
		"cache" : {
			"bytes read into cache" : 0,
			"bytes written from cache" : 0,
			"checkpoint blocked page eviction" : 0,
			"unmodified pages evicted" : 0,
			"page split during eviction deepened the tree" : 0,
			"modified pages evicted" : 0,
			"data source pages selected for eviction unable to be evicted" : 0,
			"hazard pointer blocked page eviction" : 0,
			"internal pages evicted" : 0,
			"pages split during eviction" : 0,
			"in-memory page splits" : 0,
			"overflow values cached in memory" : 0,
			"pages read into cache" : 0,
			"overflow pages read into cache" : 0,
			"pages written from cache" : 0
		},
		"compression" : {
			"raw compression call failed, no additional data available" : 0,
			"raw compression call failed, additional data available" : 0,
			"raw compression call succeeded" : 0,
			"compressed pages read" : 0,
			"compressed pages written" : 0,
			"page written failed to compress" : 0,
			"page written was too small to compress" : 0
		},
		"cursor" : {
			"create calls" : 3,
			"insert calls" : 0,
			"bulk-loaded cursor-insert calls" : 0,
			"cursor-insert key and value bytes inserted" : 0,
			"next calls" : 1,
			"prev calls" : 1,
			"remove calls" : 0,
			"cursor-remove key bytes removed" : 0,
			"reset calls" : 285,
			"search calls" : 0,
			"search near calls" : 0,
			"update calls" : 0,
			"cursor-update value bytes updated" : 0
		},
		"reconciliation" : {
			"dictionary matches" : 0,
			"internal page multi-block writes" : 0,
			"leaf page multi-block writes" : 0,
			"maximum blocks required for a page" : 0,
			"internal-page overflow keys" : 0,
			"leaf-page overflow keys" : 0,
			"overflow values written" : 0,
			"pages deleted" : 0,
			"page checksum matches" : 0,
			"page reconciliation calls" : 0,
			"page reconciliation calls for eviction" : 0,
			"leaf page key bytes discarded using prefix compression" : 0,
			"internal page key bytes discarded using suffix compression" : 0
		},
		"session" : {
			"object compaction" : 0,
			"open cursor count" : 3
		},
		"transaction" : {
			"update conflicts" : 0
		}
	},
	"nindexes" : 0,
	"totalIndexSize" : 0,
	"indexSizes" : {
		
	},
	"ok" : 1
}

I will update again after the secondary has failed with the requested log files.

Comment by Ramon Fernandez Marina [ 24/Nov/15 ]

Sorry this keeps happening on your setup Marc. You may be affected by SERVER-18908, which is being actively investigated. Despite SERVER-18908, if your primary in shard 0 is under heavy load from write operations and chunk migrations that may be the cause of this behavior, and you'll need a bigger oplog in the affected secondary to be able to cope with your load requirements. You may also need changes in your sharding strategy if that's what's causing the high load on shard 0.

The first thing I'd recommend is moving to MongoDB 3.0.7. Then, can you please increase the size of the oplog ([documented here|https://docs.mongodb.org/manual/tutorial/change-oplog-size/#change-the-size-of-the-oplog) or restart this secondary with a much bigger oplog to see if the issue reproduces? This should help us narrow down the issue towards determining if this is a server bug or a setup issue.

In terms of debugging information to collect, you can run the following from a bash shell both on the primary and secondary of the affected shard:

# Delay in seconds
delay=1
mongo --eval "while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep($delay*1000)}" >ss.log &
iostat -k -t -x $delay >iostat.log &

This collects data at 1-second intervals and produces two files for each node, ss.log and iostat.log. Start data collection when you bring up the secondary for initial sync, and run it until the secondary falls off the oplog – then you can upload those files here or to this JIRA ticket.

Other information that will be useful is the output of the following commands (run in the affected secondary):

rs.conf()
use local
db.oplog.rs.stats()

Thanks,
Ramón.

Comment by Marc Fletcher [ 24/Nov/15 ]

For reference this issue occurs at least once every 24 hours (approximately), although not during the same time of day, i.e. sometimes it's in the night, sometimes mornings, sometimes afternoons etc. As such if there is additional debugging I can enable I would be happy to do so.

Comment by Marc Fletcher [ 23/Nov/15 ]

Any further update/info on this one? Not having reliable replication is kind of a show stopper.

Comment by Marc Fletcher [ 18/Nov/15 ]

WiredTiger, sorry for misunderstanding.

Comment by Kelsey Schubert [ 18/Nov/15 ]

Hi MarcF,

Ramon was asking whether you were using WiredTiger or MMAPv1 as your storage engine. Can you please specify?

Sorry for the confusion,
Thomas

Comment by Marc Fletcher [ 18/Nov/15 ]

Atleast it's nothing obvious The storage system on all nodes is as follows:

The EC2 instance has two attached volumes for use by the mongod
/var/mongodata/ - 250GB EBS volume (IOPS Base 750 / Burst 3000)
/var/mongodata/journal/ - 25GB EBS volume (Provisioned IOPS Base 125)

I am unable to find any logs for mongos, please advise on the default storage location. We have a mongos instance for each of our (8) application servers.

Comment by Ramon Fernandez Marina [ 18/Nov/15 ]

MarcF, unfortunately there is not information in the logs to provide a theory on what may be causing the secondary to fall behind. I see a large number of migrations, long-running queries, and large update operations that may be contributing. What storage system are you using in these nodes? Is there any additional information in the logs for mongos?

Thanks,
Ramón.

Generated at Thu Feb 08 03:57:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.