[SERVER-13681] MongoDB stalls during background flush on Windows Created: 22/Apr/14  Updated: 11/Mar/15  Resolved: 31/Jul/14

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 2.6.4, 2.7.5

Type: Bug Priority: Major - P3
Reporter: Mark Benvenuto Assignee: Mark Benvenuto
Resolution: Done Votes: 0
Labels: 26qa, 28qa, Windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File log     File log.2014-07-18T17-42-40     File profile2.etl    
Issue Links:
Depends
is depended on by SERVER-9868 heartbeats not responded to during mm... Closed
is depended on by SERVER-12880 Server pauses on requests every 60 se... Closed
is depended on by SERVER-13444 Long locked flush without inserts and... Closed
Related
related to SERVER-9355 Mongodb crashed after - FlushViewOfFi... Closed
is related to DOCS-3841 Need to update Windows Azure recommen... Closed
Backwards Compatibility: Fully Compatible
Operating System: Windows
Backport Completed:
Sprint: Server 2.7.2, Server 2.7.5
Participants:

 Description   
Issue Status as of Aug 4, 2014

ISSUE SUMMARY
For MongoDB instances running on Windows, MongoDB takes a global mutex that blocks all requests during the background flush of database files to disk.

USER IMPACT
Database reads, and writes block during background flushes. Users will see long request times while requests wait for the background database flush to finish.

WORKAROUNDS
N/A

AFFECTED VERSIONS
All versions of MongoDB prior to 2.6.4 are affected by this issue.

FIX VERSION
The fix is included in the 2.6.4 production release.

RESOLUTION DETAILS
A unnecessary lock was removed from the product.

Original description

MongoDB has low CPU usage and does not process requests while a background flush is proceeding.

The cause of this has been identified as the following blocking chain:

T1: Generic Query Thread: Holds: Nothing, Acquires DBLock(R), Waits on T2
T2: WRITEDATAFILES Thread: Holds DBLock(W). Acquires: GlobalFlushMutex, Waits T3
T3: Flush: Holds Global Flush Mutex, Waits: I/O

The lock was originally added in SERVER-7378 to workaround a bug in the Windows Azure Storage driver . It has been confirmed by Microsoft that there is a bug in the driver that only affects (a) memory mapped files such as MongoDB databases which are (b) concurrently updated while flushing to (c) a drive that is hosted on a Azure disk that does not set host cache preference to read/write.

We have removed the SERVER-7378 workaround since it penalizes all Windows deployment scenarios, including all scenarios where there is no bug (like bare-metal, other cloud providers, etc).



 Comments   
Comment by Githook User [ 29/Jul/14 ]

Author:

{u'username': u'kangas', u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}

Message: SERVER-13681 Revert "SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time"

This reverts commit 5352345eda2ffa24a72af1d5bbaccaa32bcb1006
Branch: master
https://github.com/mongodb/mongo/commit/d7637a71f401f2457db3caca8c8a03d9b46d1ed0

Comment by Matt Kangas [ 29/Jul/14 ]

https://github.com/mongodb/mongo/commit/2ec547c158e1bd7e0339288e0b7ed33ba46e58f6
Branch: v2.6

Author: Matt Kangas <matt.kangas@mongodb.com>
AuthorDate: Fri Jul 25 14:35:12 2014 -0400
Commit: Matt Kangas <matt.kangas@mongodb.com>
CommitDate: Tue Jul 29 13:51:38 2014 -0400

Revert "SERVER-7378 Prevent WRITETODATAFILES and FlushViewOfFile from running at the same time"

This reverts commit 5352345eda2ffa24a72af1d5bbaccaa32bcb1006

Comment by Githook User [ 24/Jul/14 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: Revert "SERVER-13681: Remove lock that was an incorrect workaround for Azure Drive write size limitations."

This reverts commit 78c068e5d63d77648954252a785b3673d6e314b7.

(cherry picked from commit 231262fda82a3245b8690fdc6421ea1fa3a8a2dd)
Branch: v2.6
https://github.com/mongodb/mongo/commit/d2ca93b7a52be546df68ec6a5343ccbffd4ca8b3

Comment by Githook User [ 24/Jul/14 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: Revert "SERVER-13681: Remove lock that was an incorrect workaround for Azure Drive write size limitations."

This reverts commit 78c068e5d63d77648954252a785b3673d6e314b7.
Branch: master
https://github.com/mongodb/mongo/commit/231262fda82a3245b8690fdc6421ea1fa3a8a2dd

Comment by Andrew Emil (Inactive) [ 18/Jul/14 ]

Added new logs and etl file. Log is named "log", etl file is named "profile2.etl"

Comment by Andrew Emil (Inactive) [ 18/Jul/14 ]

Sample log uploaded, trying to get more detailed logs using xperf now

Comment by Andrew Emil (Inactive) [ 18/Jul/14 ]

Was able to reproduce this issue on 2.6.4-pre (32ae9aa4f46eb95b4d70b7f69a1e84bc95a1a5d1) running with --nojournal

Reproduction environment:
Windows Server 2012 R2 Datacenter
Azure "Standard A2" 2 core 3.5GB RAM
50gig No-Caching storage

Comment by Githook User [ 16/Jun/14 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-13681: Remove lock that was an incorrect workaround for Azure Drive write size limitations.

  • Limit flushes to 2MB chunks to prevent hitting a limitation in
    Azure Network Drives (XDrive) which have a 2MB file write size limitation.

(cherry picked from commit 78c068e5d63d77648954252a785b3673d6e314b7)
Branch: v2.6
https://github.com/mongodb/mongo/commit/4a990e69f02498e707059aff4359fb95c605814a

Comment by Githook User [ 05/Jun/14 ]

Author:

{u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}

Message: SERVER-13681: Remove lock that was an incorrect workaround for Azure Drive write size limitations.

Generated at Thu Feb 08 03:32:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.