[SERVER-21001] MongoDb hang and then crash Created: 19/Oct/15  Updated: 04/Nov/15  Resolved: 30/Oct/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.0-rc0
Fix Version/s: 3.2.0-rc2

Type: Bug Priority: Major - P3
Reporter: Nick Judson Assignee: Martin Bligh
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 3.2015-10-19T04-58-54.mdmp     PNG File zlib_crash.png    
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

I will try and repro it locally.

Participants:

 Description   

Running my typical workload with the following MongoDb:

C:\Program Files\MongoDB\Server\3.2\bin>mongod --dbpath=d:\mongo --wiredTigerJournalCompressor=zlib --wiredTigerCollectionBlockCompressor=zlib --wiredTigerCache
SizeGB=1 --wiredTigerEngineConfigString direct_io=[data]



 Comments   
Comment by Githook User [ 30/Oct/15 ]

Author:

{u'username': u'martinbligh', u'name': u'Martin Bligh', u'email': u'mbligh@mongodb.com'}

Message: SERVER-21001: retry collection create on write conflict
Branch: master
https://github.com/mongodb/mongo/commit/965d98e5c5185f30edcbee810685ab0ab6810f8d

Comment by Martin Bligh [ 19/Oct/15 ]

No worries, think we might have figured it out from your stack trace.

Comment by Nick Judson [ 19/Oct/15 ]

Martin - I haven't been able to repro it - sadly I wasn't paying enough attention to what I was doing when it crashed. My guess is that it was creating & indexing collections while working hard filling up other existing collections. Sorry...

Comment by Martin Bligh [ 19/Oct/15 ]

Hi Nick, I did some refactoring around this code for performance reasons between 3.1.8 and 3.2.0-rc0.

Not sure how easy this is for you to reproduce, but if there's any way you could try 3.1.8 or give us a description of what your workload does that we can try to repro locally, that would be very useful.

Thanks, M.

Comment by Mark Benvenuto [ 19/Oct/15 ]

The cause of the crash was a WriteConfllictException thrown during Database::createCollection during insertOne.

As part of WriteBatchExecutor::ExecInsertsState::lockAndCheck called at

  1. https://github.com/mongodb/mongo/blob/r3.2.0-rc0/src/mongo/db/commands/write_commands/batch_executor.cpp#L982
  2. https://github.com/mongodb/mongo/blob/r3.2.0-rc0/src/mongo/db/commands/write_commands/batch_executor.cpp#L947
  3. https://github.com/mongodb/mongo/blob/r3.2.0-rc0/src/mongo/db/commands/write_commands/batch_executor.cpp#L885
  4. https://github.com/mongodb/mongo/blob/r3.2.0-rc0/src/mongo/db/commands/write_commands/batch_executor.cpp#L934

If Database::createCollection throws at the line 934, it will leave WriteBatchExecutor::ExecInsertsState with the following state:

Local var @ 0x98cffdc10 Type mongo::WriteBatchExecutor::ExecInsertsState
   +0x000 txn              : 0x00000009`f32870f0 mongo::OperationContext
   +0x008 request          : 0x00000009`8cffe1c0 mongo::BatchedCommandRequest
   +0x010 currIndex        : 0
   +0x018 normalizedInserts : std::vector<mongo::StatusWith<mongo::BSONObj>,std::allocator<mongo::StatusWith<mongo::BSONObj> > >
   +0x030 _transaction     : mongo::ScopedTransaction
   +0x038 _dbLock          : std::unique_ptr<mongo::Lock::DBLock,std::default_delete<mongo::Lock::DBLock> >
   +0x040 _collLock        : std::unique_ptr<mongo::Lock::CollectionLock,std::default_delete<mongo::Lock::CollectionLock> >
   +0x048 _database        : 0x00000009`f0e372e0 mongo::Database
   +0x050 _collection      : (null) 

In this case, the _collLock will be a MODE_X which is necessary for a call to Database::createCollection. Finally, we know a WriteConflictException was thrown here because of the WCE loop counter:

0:055> dv
  wcr__Attempts = 0n1

Generated at Thu Feb 08 03:55:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.