[SERVER-10260] Server Crash with: warning: DR102 too much data written uncommitted 315.318MB Created: 19/Jul/13  Updated: 10/Dec/14  Resolved: 04/Apr/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esteban Feldman Assignee: Unassigned
Resolution: Done Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux 3.2.0-23-virtual #36-Ubuntu SMP Tue Apr 10 22:29:03 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux Ubuntu 12.04.2


Operating System: Linux
Participants:

 Description   

The system was working for a long time with no problem. Yesterday I came to a crash and this message in the log. I upgraded from 2.4.3 to 2.4.5 to see if that mitigated, but not, still happening.

It's a 4 machines shard environment with no replication. 3 Configs.

0xdd9e31 0x921161 0x92123b 0x9212b2 0x9213e3 0x921450 0x91ae8a 0x80022b 0x9d3ae3 0x9db144 0x9dcdcc 0xac35ff 0xac4092 0xac44e1 0xa8c1d3 0xa8fe67 0x9f2ff8 0x9f8588 0x6e8b68 0xdc659e
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdd9e31]
/usr/bin/mongod(_ZN5mongo3dur9CommitJob4noteEPvi+0x201) [0x921161]
/usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents8_unspoolEv+0x4b) [0x92123b]
/usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents7unspoolEv+0x52) [0x9212b2]
/usr/bin/mongod(_ZN5mongo3dur18ThreadLocalIntents4pushERKNS0_11WriteIntentE+0x83) [0x9213e3]
/usr/bin/mongod(_ZN5mongo3dur11DurableImpl18declareWriteIntentEPvj+0x60) [0x921450]
/usr/bin/mongod(_ZN5mongo3dur11DurableImpl10writingPtrEPvj+0xa) [0x91ae8a]
/usr/bin/mongod(ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3+0x68b) [0x80022b]
/usr/bin/mongod(ZNK5mongo18IndexInterfaceImplINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3+0x63) [0x9d3ae3]
/usr/bin/mongod() [0x9db144]
/usr/bin/mongod(_ZN5mongo13unindexRecordEPNS_16NamespaceDetailsEPNS_6RecordERKNS_7DiskLocEb+0x7c) [0x9dcdcc]
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPNS_16NamespaceDetailsEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x1bf) [0xac35ff]
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x82) [0xac4092]
/usr/bin/mongod(_ZN5mongo11DataFileMgr12updateRecordEPKcPNS_16NamespaceDetailsEPNS_25NamespaceDetailsTransientEPNS_6RecordERKNS_7DiskLocES2_iRNS_7OpDebugEb+0x421) [0xac44e1]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x1403) [0xa8c1d3]
/usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa8fe67]
/usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4d8) [0x9f2ff8]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xac8) [0x9f8588]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e8b68]
/usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdc659e]
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.31MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.318MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.326MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.335MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.343MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.351MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.359MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.367MB
Thu Jul 18 18:52:51.277 [conn24] warning: DR102 too much data written uncommitted 315.376MB



 Comments   
Comment by Daniel Pasette (Inactive) [ 25/Jul/13 ]

you can try reducing the journalCommitInterval to 50. Let me know if this helps.

Comment by Esteban Feldman [ 25/Jul/13 ]

@Dan, thanks, I didn't understood. So the thing is that I can't put the journal files o another disk. Is there any other alternative? Thanks

Comment by Daniel Pasette (Inactive) [ 25/Jul/13 ]

Regarding putting the journal on a different physical disk, you may wish, before starting mongod to symlink the journal/ directory located in the dbpath to a dedicated hard drive to speed the frequent (fsynced) sequential writes which occur to the current journal file. I don't think there is any explicit mention of this in the current documentation. There is an outstanding documentation ticket for this here with a bit more information: DOCS-1420.

Regarding the journalCommitInterval, this is server configuration setting. It defaults to 100ms if the journal directory is on the same physical volume as the data files. If the journal file is on a separate volume, the default lowers automatically to 30ms.

Comment by Esteban Feldman [ 24/Jul/13 ]

Hi Dan.
1. What it means to put the journal on a separate spindle? any links or docs on the subject?
2. This goes in the config file? (looking at the docs)

Comment by Daniel Pasette (Inactive) [ 23/Jul/13 ]

DR102 is a warning indicating that the journal isn't keeping up with the write load. There are a couple recommendations for dealing with DR102 errors:

  1. if possible, put the journal on a separate spindle, to isolate and remove contention between journal write operations and writes to non-journal files,
  2. reduce the journal commit periodicity from its default of 100ms perhaps to 75ms or 50ms. The goal here is that more frequent commits will reduce the probability of a large backlog of not-yet-committed changes in memory.
Generated at Thu Feb 08 03:22:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.