[SERVER-20993] lock file not deleted when server terminating due to moveChunk commit failed error Created: 18/Oct/15  Updated: 28/Oct/15  Resolved: 28/Oct/15

Status: Closed
Project: Core Server
Component/s: Admin, Sharding
Affects Version/s: 2.6.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Avi Ribchinsky [X] Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-8358 "Move chunk commit failed" shutdown l... Closed
Operating System: ALL
Steps To Reproduce:

Happened once not sure how to reproduce.

Participants:

 Description   

The mongod process terminated due to "ERROR: moveChunk commit failed:" journal files were cleaned up but the lock file was not removed causing subsequent start ups to fail due to old lock file.

... for command :{ $err: "socket exception [SEND_ERROR] for 0.1.2.187:1, code: 9001 }
2015-10-03T00:53:43.494+0000 [conn12906] waiting till out of critical section
2015-10-03T00:53:43.501+0000 [conn12900] waiting till out of critical section
2015-10-03T00:53:43.643+0000 [conn12904] waiting till out of critical section
2015-10-03T00:53:43.651+0000 [conn12907] waiting till out of critical section
2015-10-03T00:53:43.735+0000 [conn12905] waiting till out of critical section
2015-10-03T00:53:43.743+0000 [conn12902] waiting till out of critical section
2015-10-03T00:53:43.893+0000 [conn12903] waiting till out of critical section
2015-10-03T00:53:43.901+0000 [conn12901] waiting till out of critical section
2015-10-03T00:53:53.461+0000 [conn12908] ERROR: moveChunk commit failed: version is at 5|5||000000000000000000000000 instead of 6|1||55dce04fae57789a22a4d141
2015-10-03T00:53:53.461+0000 [conn12908] ERROR: TERMINATING
2015-10-03T00:53:53.461+0000 [conn12908] dbexit:
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to close listening sockets...
2015-10-03T00:53:53.461+0000 [conn12908] closing listening socket: 10
2015-10-03T00:53:53.461+0000 [conn12908] closing listening socket: 13
2015-10-03T00:53:53.461+0000 [conn12908] removing socket file: /tmp/mongodb-27022.sock
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to flush diaglog...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: going to close sockets...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: waiting for fs preallocator...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: lock for final commit...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: final commit...
2015-10-03T00:53:53.461+0000 [conn12908] shutdown: closing all files...
2015-10-03T00:53:53.461+0000 [conn12750] end connection 0.1.2.187:1 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12929] end connection 0.1.2.187:2 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12936] end connection 0.1.2.187:3 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12915] end connection 0.1.2.187:4 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12966] end connection 0.1.2.187:5 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12967] end connection 0.1.2.134:6 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12949] end connection 0.1.2.134:7 (22 connections now open)
2015-10-03T00:53:53.461+0000 [conn12736] end connection 0.1.2.134:8 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12913] end connection 0.1.2.134:9 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12912] end connection 0.1.2.134:10 (22 connections now open)
2015-10-03T00:53:53.462+0000 [conn12726] end connection 0.1.2.134:11 (22 connections now open)
2015-10-03T00:53:53.465+0000 [conn12908] closeAllFiles() finished
2015-10-03T00:53:53.465+0000 [conn12908] journalCleanup...
2015-10-03T00:53:53.465+0000 [conn12908] removeJournalFiles
2015-10-03T00:53:53.483+0000 [initandlisten] now exiting
2015-10-03T00:53:53.483+0000 [initandlisten] dbexit: ; exiting immediately

Was not able to verify from documentation or google groups if this is an expected behavior or a mongo bug.

Related defect I've found: SERVER-3009

Exact server version

2015-10-03T01:08:10.101+0000 [initandlisten] db version v2.6.9
2015-10-03T01:08:10.101+0000 [initandlisten] git version: df313bc75aa94d192330cb92756fc486ea604e64
2015-10-03T01:08:10.101+0000 [initandlisten] build info: Linux build20.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49



 Comments   
Comment by Ramon Fernandez Marina [ 28/Oct/15 ]

Hi Avi Ribchinsky, this behavior was previously reported in SERVER-8358, so I'm going to mark this ticket as a duplicate. I see you've already posted on that ticket, feel free to watch it as well for updates.

Regards,
Ramón.

Generated at Thu Feb 08 03:55:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.