[SERVER-16588] [rsSync] warning: DR102 too much data written uncommitted 314.577MB Created: 18/Dec/14  Updated: 30/May/16  Resolved: 02/Mar/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.4
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Craig Genner Assignee: Ramon Fernandez Marina
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongodb.log.5.gz    
Issue Links:
Related
related to SERVER-6925 DR102 too much data written uncommitted Closed
Operating System: ALL
Steps To Reproduce:

Install 2.6.6 onto debian 7.7, set up a 3 node cluster (one is an arbiter node).

Watch the mongo.log on the secondary.

Participants:

 Description   

I'm seeing the problem (or similar to) as reported in https://jira.mongodb.org/browse/SERVER-6925

2014-12-17T15:19:04.998+0000 [rsSync] warning: DR102 too much data written uncommitted 314.577MB
2014-12-17T15:19:06.106+0000 [rsSync] local.oplog.rs warning assertion failure _intents.size() < 2000000 src/mongo/db/dur_commitjob.h 112
2014-12-17T15:19:06.116+0000 [rsSync] local.oplog.rs 0x11e6111 0x1187e49 0x116be5f 0xa64c8f 0xa64e2b 0xa64ea2 0xa64fd3 0xa65047 0xa5deba 0xf2663f 0xf2295d 0xf2451f 0xf29aea 0xf29b4f 0xf32ddf 0xf32770 0x8b601f 0x8b693a 0xe5667f 0xeaf91b
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11e6111]
 /usr/bin/mongod(_ZN5mongo10logContextEPKc+0x159) [0x1187e49]
 /usr/bin/mongod(_ZN5mongo9wassertedEPKcS1_j+0x17f) [0x116be5f]
 /usr/bin/mongod(_ZN5mongo3dur9CommitJob4noteEPvi+0x31f) [0xa64c8f]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents8_unspoolEv+0x4b) [0xa64e2b]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents7unspoolEv+0x52) [0xa64ea2]
 /usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents4pushERKNS0_11WriteIntentE+0x83) [0xa64fd3]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl18declareWriteIntentEPvj+0x67) [0xa65047]
 /usr/bin/mongod(_ZN5mongo3dur11DurableImpl10writingPtrEPvj+0xa) [0xa5deba]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails13addDeletedRecEPNS_13DeletedRecordENS_7DiskLocE+0x11f) [0xf2663f]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails7compactEv+0x25d) [0xf2295d]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails11cappedAllocEPNS_10CollectionERKNS_10StringDataEi+0x26f) [0xf2451f]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails6_allocEPNS_10CollectionERKNS_10StringDataEi+0x2a) [0xf29aea]
 /usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPNS_10CollectionERKNS_10StringDataEi+0x5f) [0xf29b4f]
 /usr/bin/mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEii+0x3f) [0xf32ddf]
 /usr/bin/mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPKcii+0x60) [0xf32770]
 /usr/bin/mongod(_ZN5mongo10Collection15_insertDocumentERKNS_7BSONObjEbPKNS_16PregeneratedKeysE+0x7f) [0x8b601f]
 /usr/bin/mongod(_ZN5mongo10Collection14insertDocumentERKNS_7BSONObjEbPKNS_16PregeneratedKeysE+0x22a) [0x8b693a]
 /usr/bin/mongod(_ZN5mongo11_logOpObjRSERKNS_7BSONObjE+0x33f) [0xe5667f]
 /usr/bin/mongod(_ZN5mongo7replset8SyncTail15applyOpsToOplogEPSt5dequeINS_7BSONObjESaIS3_EE+0x4b) [0xeaf91b]

Given that this was reported as fixed in 2.2.x and 2.3.x series I'm a little unsure if this is new or another manifest of the same bug.

The data inserted into mongo is very small (in the order of kilobytes) and it's an average of a couple of MB per hour - so not exactly large data volumes to replicate.

# mongod --version
db version v2.6.4
2014-12-18T15:10:45.077+0000 git version: 3a830be0eb92d772aa855ebb711ac91d658ee910



 Comments   
Comment by Daniel Pasette (Inactive) [ 30/May/16 ]

We believe this is related to physical data layout in the oplog. In a degenerate case this layout could result in this warning, though it is rare. A machine with a different oplog size would indeed change whether this message appears or not.

Comment by Yoni Levy [ 29/May/16 ]

Hi, I'm seeing the same exact error, also with 314.577MB, on mongo version 2.6.3.
We've noticed it stopped happening on a machine on which we increased the oplog size (5gb -> 15gb), not sure if that's related tho.

Any new information on this?

Comment by Craig Genner [ 02/Mar/15 ]

I've been waiting for any ideas to resolve this after I attached the logs.

I'll get the additional information to you on the indexes you have requested.

Thanks

Craig

Comment by Ramon Fernandez Marina [ 02/Mar/15 ]

craiggenner, we haven't heard from you for a while so I'm going to resolve this ticket. If this is still an issue for you feel free to reopen it and let us know if you have any indexes in the collections where these operations are being performed (kglue.user, kglue.issue).

Regards,
Ramón.

Comment by Craig Genner [ 22/Dec/14 ]

mongo log attachement

Comment by Daniel Pasette (Inactive) [ 22/Dec/14 ]

If you can compress and attach the full mongodb log, that would be helpful.
Thanks, Dan

Comment by Craig Genner [ 22/Dec/14 ]

Hi Dan,

There isn't a great deal of documents in this mongo installation, it has maybe a couple of hundred new and updated documents each day, I would certainly not expect the database traffic over an hour to be more than 100MB, so why it's replicating more than 256MB is a source of confusion.

We are also seeing this at all times of the day, and it's always 314.577MB.

This is an example of the largest document size that would be updated/inserted:

shard001:PRIMARY> db.issue.find(

{'ticketId':'INC000000453819'}

).pretty();
{
"_id" : ObjectId("5497eb30e4b0fed67c5d3bd5"),
"schemaVersion" : 2,
"sequenceNo" : "93871",
"ticketId" : "INC000000453819",
"checkName" : "snmptrap_Med_Impact",
"callBackURL" : "http://nagios/nagios/cgi-bin/extinfo.cgi?type=2&host=tfm40-wan-7206-a&service=snmptrap_Med_Impact",
"monitorHost" : "intthnag03",
"priority" : "MEDIUM",
"impact" : "OK",
"node" : {
"name" : "I-NWD-TFM40-WAN-7206-A",
"hostName" : "tfm40-wan-7206-a",
"serviceLevel" : "i-int-fns",
"companyName" : "i",
"supportGroup" :

{ "id" : "sgp000000000157", "name" : "Firewall & Network Services", "organizationName" : "GIS-Firewall & Network Services", "companyName" : "i" }

},
"startTime" : ISODate("2014-12-22T09:58:08.799Z"),
"clockStart" : ISODate("2014-12-22T09:59:10.424Z"),
"elapsedDuration" : NumberLong(61625),
"notificationSchedule" : [
{
"delay" : 15,
"frequency" : 15,
"nextNotification" : ISODate("2014-12-22T10:13:08.799Z"),
"contact" :

{ "username" : "service.desk", "firstName" : "Service", "lastName" : "Desk", "telephoneNumber" : "44 1904 435412", "emailAddress" : "ismi-escalation@p.com", "serviceDeskId" : null, "alertTypes" : [ "EMAIL" ], "contactOnce" : false }

,
"contacted" : false,
"active" : false,
"escalation" : true
},

{ "rotaName" : "sgp000000000157", "delay" : 0, "frequency" : 15, "nextNotification" : ISODate("2014-12-22T10:14:10.426Z"), "contacted" : false, "active" : false, "escalation" : false }

],
"notifications" : [
{
"contact" :

{ "username" : "md", "firstName" : "M", "lastName" : "D", "telephoneNumber" : "4477XXXXXXXXXX", "emailAddress" : "md@p.com", "serviceDeskId" : "md@k.com", "alertTypes" : [ "SMS" ], "contactOnce" : false }

,
"notificationEvents" : [

{ "timestamp" : ISODate("2014-12-22T09:59:10.426Z"), "notificationType" : "ALERT", "alertType" : "SMS", "status" : "QUEUED" }

,

{ "timestamp" : ISODate("2014-12-22T09:59:11.602Z"), "notificationType" : "ALERT", "alertType" : "SMS", "status" : "SENT_OK" }

,

{ "timestamp" : ISODate("2014-12-22T10:04:36.916Z"), "notificationType" : "ACKNOWLEDGEMENT", "alertType" : "SMS", "status" : "RECEIVED" }

]
}
],
"events" : [

{ "summary" : "HSRP state change for Group 247 TO state active", "detail" : "", "eventTime" : ISODate("2014-12-22T09:58:08.202Z"), "recordedInTicket" : true, "type" : "MONITORING", "dataType" : "INFO" }

,

{ "summary" : "Queued alert notification to Md", "detail" : null, "eventTime" : ISODate("2014-12-22T09:59:10.426Z"), "recordedInTicket" : true, "type" : "INTERNAL", "dataType" : "DEBUG" }

,

{ "summary" : "Sent alert notification to Md", "detail" : null, "eventTime" : ISODate("2014-12-22T09:59:11.602Z"), "recordedInTicket" : true, "type" : "INTERNAL", "dataType" : "INFO" }

,

{ "summary" : "OK: Waiting for SNMP trap to be received", "detail" : "", "eventTime" : ISODate("2014-12-22T10:01:33.754Z"), "recordedInTicket" : true, "type" : "MONITORING", "dataType" : "INFO" }

,

{ "summary" : "Acknowledgement received from Md", "detail" : null, "eventTime" : ISODate("2014-12-22T10:04:36.916Z"), "recordedInTicket" : true, "type" : "INTERNAL", "dataType" : "INFO" }

],
"statusList" : [

{ "status" : "OPEN", "timestamp" : ISODate("2014-12-22T09:58:08.803Z") }

,

{ "status" : "ACCEPTED", "timestamp" : ISODate("2014-12-22T10:04:36.916Z") }

],
"currentStatus" : "ACCEPTED",
"issueClosed" : null,
"ticketStatus" : "ASSIGNED",
"type" : "MONITORING",
"dataMissingFromCMDB" : null,
"notificationStatus" : "NONE",
"updated" : false,
"accepted" : false,
"recordStatus" : "NOT_LOCKED",
"lastProcessed" : ISODate("2014-12-22T11:07:17.933Z"),
"_class" : "com.p.pom.common.entity.Issue",
"lastLocked" : ISODate("2014-12-22T11:07:16.764Z")
}

Anything else I can provide?

Comment by Daniel Pasette (Inactive) [ 21/Dec/14 ]

Hi craiggenner, the message you're seeing indicates you are writing more than a couple kb. Can you describe the nature of the writes you're performing?

Generated at Thu Feb 08 03:41:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.