[SERVER-8358] "Move chunk commit failed" shutdown leaves mongod with no journal files and mongod.lock file Created: 28/Jan/13  Updated: 25/Feb/17  Resolved: 25/Feb/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.3.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kristina Chodorow (Inactive) Assignee: Kaloian Manassiev
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File log.tar.bz2    
Issue Links:
Duplicate
duplicates SERVER-26886 Config server shuts down while a migr... Closed
is duplicated by SERVER-20993 lock file not deleted when server ter... Closed
Related
related to SERVER-7861 Add marker file for journaling to ens... Closed
is related to SERVER-19707 A crash after dropDatabase leaves the... Closed
Operating System: ALL
Participants:

 Description   

This makes it difficult to recover from.

Mon Jan 28 09:25:29.328 [conn27] moveChunk commit outcome ongoing: { applyOps: [ { op: "u", b: false, ns: "config.chunks", o: { _id: "test.not.hashed-user_id_"user327"", lastmod: Timestamp 30000|0, lastmodEpoch: ObjectId('510681b502471f5418db3e35'), ns: "test.not.hashed", min: { user_id: "user327" }, max: { user_id: "user3332" }, shard: "test-rs2" }, o2: { _id: "test.not.hashed-user_id_"user327"" } }, { op: "u", b: false, ns: "config.chunks", o: { _id: "test.not.hashed-user_id_"user3332"", lastmod: Timestamp 30000|1, lastmodEpoch: ObjectId('510681b502471f5418db3e35'), ns: "test.not.hashed", min: { user_id: "user3332" }, max: { user_id: "user3392" }, shard: "test-rs4" }, o2: { _id: "test.not.hashed-user_id_"user3332"" } } ], preCondition: [ { ns: "config.chunks", q: { query: { ns: "test.not.hashed" }, orderby: { lastmod: -1 } }, res: { lastmod: Timestamp 29000|1 } } ] } for command :{ $err: "SyncClusterConnection::findOne prepare failed: 10276 DBClientBase::findN: transport error: ubuntu:28001 ns: admin.$cmd query: { fsync: 1 } ubuntu:28001:{}", code: 13104 }
Mon Jan 28 09:25:30.312 [conn88230] end connection 127.0.0.1:36609 (18 connections now open)
Mon Jan 28 09:25:30.312 [initandlisten] connection accepted from 127.0.0.1:36631 #88232 (19 connections now open)
Mon Jan 28 09:25:39.328 [conn27] SyncClusterConnection connecting to [ubuntu:28000]
Mon Jan 28 09:25:39.328 [conn27] SyncClusterConnection connecting to [ubuntu:28001]
Mon Jan 28 09:25:39.328 [conn27] SyncClusterConnection connecting to [ubuntu:28002]
Mon Jan 28 09:25:39.329 [conn27] moveChunk commit failed: version is at29|1||000000000000000000000000 instead of 30|1||510681b502471f5418db3e35
Mon Jan 28 09:25:39.329 [conn27] TERMINATING
Mon Jan 28 09:25:39.329 dbexit: 
Mon Jan 28 09:25:39.329 [conn27] shutdown: going to close listening sockets...
Mon Jan 28 09:25:39.329 [conn27] closing listening socket: 9
Mon Jan 28 09:25:39.329 [conn27] closing listening socket: 11
Mon Jan 28 09:25:39.329 [conn27] closing listening socket: 12
Mon Jan 28 09:25:39.329 [conn27] removing socket file: /tmp/mongodb-31500.sock
Mon Jan 28 09:25:39.329 [conn27] shutdown: going to flush diaglog...
Mon Jan 28 09:25:39.329 [conn27] shutdown: going to close sockets...
Mon Jan 28 09:25:39.329 [conn27] shutdown: waiting for fs preallocator...
Mon Jan 28 09:25:39.329 [conn27] shutdown: closing all files...
Mon Jan 28 09:25:39.330 [conn35376] end connection 127.0.0.1:47546 (18 connections now open)
Mon Jan 28 09:25:39.330 [initandlisten] now exiting
Mon Jan 28 09:25:39.330 dbexit: ; exiting immediately
 
 
***** SERVER RESTARTED *****
 
 
Mon Jan 28 10:01:49.172 [initandlisten] MongoDB starting : pid=19657 port=31500 dbpath=/data/db/test-rs4-0 64-bit host=ubuntu
Mon Jan 28 10:01:49.172 [initandlisten] 
Mon Jan 28 10:01:49.172 [initandlisten] ** NOTE: This is a development version (2.3.3-pre-) of MongoDB.
Mon Jan 28 10:01:49.172 [initandlisten] **       Not recommended for production.
Mon Jan 28 10:01:49.172 [initandlisten] 
Mon Jan 28 10:01:49.172 [initandlisten] db version v2.3.3-pre-, pdfile version 4.5
Mon Jan 28 10:01:49.172 [initandlisten] git version: e27538de7afa9f63ca0a0a5be97dc69f26143466
Mon Jan 28 10:01:49.172 [initandlisten] build info: Linux ubuntu 2.6.38-8-generic #42-Ubuntu SMP Mon Apr 11 03:31:24 UTC 2011 x86_64 BOOST_LIB_VERSION=1_49
Mon Jan 28 10:01:49.172 [initandlisten] allocator: tcmalloc
Mon Jan 28 10:01:49.172 [initandlisten] options: { dbpath: "/data/db/test-rs4-0", fork: true, logappend: true, logpath: "/home/k/dbs/test-rs4-0.log", port: 31500, replSet: "test-rs4" }
Mon Jan 28 10:01:49.172 [initandlisten] Unable to check for journal files due to: boost::filesystem::directory_iterator::construct: No such file or directory: "/data/db/test-rs4-0/journal"
************** 
Unclean shutdown detected.
Please visit http://dochub.mongodb.org/core/repair for recovery instructions.
*************
Mon Jan 28 10:01:49.172 [initandlisten] exception in initAndListen: 12596 old lock file, terminating
Mon Jan 28 10:01:49.172 dbexit: 
Mon Jan 28 10:01:49.172 [initandlisten] shutdown: going to close listening sockets...
Mon Jan 28 10:01:49.172 [initandlisten] shutdown: going to flush diaglog...
Mon Jan 28 10:01:49.172 [initandlisten] shutdown: going to close sockets...
Mon Jan 28 10:01:49.172 [initandlisten] shutdown: waiting for fs preallocator...
Mon Jan 28 10:01:49.172 [initandlisten] shutdown: closing all files...
Mon Jan 28 10:01:49.172 [initandlisten] closeAllFiles() finished
Mon Jan 28 10:01:49.172 dbexit: really exiting now

Rest of log attached. Not sure if this is actually sharding, might be a shutdown problem.



 Comments   
Comment by Kaloian Manassiev [ 25/Feb/17 ]

The problem where the MongoD instance is left in inconsistent state is no longer a problem as of MongoDB 3.0, however we still crash if the metadata migration commit fails for any reason.

Closing this ticket as duplicate of SERVER-26886, which will be used to improve the reliability of metadata changes persistence.

Comment by Avi Ribchinsky [X] [ 19/Oct/15 ]

I've experienced this issue with v2.6.9
https://jira.mongodb.org/browse/SERVER-20993

Comment by Andy Schwerin [ 12/Oct/15 ]

Needs re-test on supported versions.

Comment by Tad Marshall [ 08/Feb/13 ]

"exiting immediately" indicates that dbexit() was called a second time (before the first one finished). It looks like conn27 was working its way through shutdownServer() when the initandlisten thread called exitCleanly() ("now exiting").

Generated at Thu Feb 08 03:17:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.