[SERVER-10462] Windows file locking related buildbot failures Created: 07/Aug/13  Updated: 11/Jul/16  Resolved: 19/May/14

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 2.6.2, 2.7.1

Type: Bug Priority: Major - P3
Reporter: Matt Kangas Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File splitvector.js_Windows_32_Mar_20.log    
Issue Links:
Duplicate
Backwards Compatibility: Fully Compatible
Operating System: Windows
Backport Completed:
Participants:

 Description   

A number of build failures on Windows systems seem to stem from file locking issues.

Working theory: Our test harness assumes that files owned by a process (usually mongod) will be unlocked as soon as the process exits.

Windows guarantees that the OS will unlock files after process termination, but makes no guarantees about when they will be unlocked. On a loaded system it may not be immediate.

Test failures we believe are due to our test framework assuming that the files are immediately unlocked on exit:

Buildbot V2.4 Windows 64-bit 2008R2+

Errors include:

WindowsError: [Error 5] Access is denied: '/data/db/sconsTests//loadserverscripts.0'
 
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: '/data/db/sconsTests//loadserverscripts.0'

Buildbot V2.4 Windows 64-bit

WindowsError: [Error 5] Access is denied: '/data/db/sconsTests//local.0'

This one different: not a test framework error, but rather a failure within a test.

Buildbot V2.4 Windows 32-bit



 Comments   
Comment by Githook User [ 01/Jun/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-10462 retry file deletes until the file is no longer being used by another process in testing on windows

(cherry picked from commit f9be10eccac4ce39ad64d84bd02916257f9a8ef6)
Branch: v2.6
https://github.com/mongodb/mongo/commit/2eb48801e124665a72347302534a219edb8ed674

Comment by Githook User [ 19/May/14 ]

Author:

{u'username': u'dannenberg', u'name': u'matt dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-10462 retry file deletes until the file is no longer being used by another process in testing on windows
Branch: master
https://github.com/mongodb/mongo/commit/f9be10eccac4ce39ad64d84bd02916257f9a8ef6

Comment by Ernie Hershey [ 30/Apr/14 ]

Just saw this again -

https://mci.10gen.com/ui/task/mongodb_mongo_v2.6_windows_32_44a476ab53c36076667edfc94c84a621a6bb61bd_14_04_30_16_00_05_jsCore_compatibility_windows_32

Comment by Spencer Brody (Inactive) [ 28/Apr/14 ]

Possibly another instance of this: https://mci.10gen.com/ui/task/mongodb_mongo_master_windows_32_f8b37a48243adce8dc6105180b3f529d44fde4b4_14_04_28_18_48_16_jsCore_windows_32#/log/E

Comment by J Rassi [ 25/Mar/14 ]

Note from the test log that no log output is generated from the mongod process that was supposed to have shut down, and the new mongod process exits immediately after not being able to acquire a filesystem lock on mongod.lock. This leads me to suspect me that smoke.py's call to TerminateJobObject() did not properly shut down the existing mongod process.

mark.benvenuto tells me that TerminateJobObject() performs a "kill -9" equivalent, so my "absence of shutdown log information" evidence is invalidated (since it is not expected for mongod to generate log output in response to a hard kill). I am still suspicious of whether the mongod is alive, though.

Comment by Matt Kangas [ 24/Mar/14 ]

Attaching test log to last example Jason provided.

Also note the presence of massive whitespace in the file. Lines 103-2100 and 2125-4100 are blank. Only 5 seconds elapses between lines 103-2100, so 400 lines/second? Separate buildlogger bug?

Comment by J Rassi [ 24/Mar/14 ]

Re-opening ticket. This failure seen again at 8b99b7e8b7 / Windows 32-bit / jsCore_compatibility (suite log here, test log here).

Suite log excerpt:

 [2014/03/20 22:22:39.679]  *******************************************
 [2014/03/20 22:22:39.679]          Test : sortk.js ...
 [2014/03/20 22:22:39.679]       Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode compatibility D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\sortk.js --eval TestData = new Object();TestData.testPath = "D:\\data\\mci\\git@github.commongodb\\mongo.git\\mongodb-mongo-v2.6\\jstests\\core\\sortk.js";TestData.testFile = "sortk.js";TestData.testName = "sortk";TestData.setParameters = "";TestData.setParametersMongos = "";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;TestData.authMechanism = "MONGODB-CR";TestData.useSSL = false;TestData.useX509 = false;TestData.useWriteCommands = false;MongoRunner.dataDir = "D:\\data\\db";MongoRunner.dataPath = MongoRunner.dataDir + "/";
 [2014/03/20 22:22:39.679]          Date : Thu Mar 20 22:22:39 2014
 [2014/03/20 22:22:39.946]                 (output suppressed; see http://buildlogs.mongodb.org/build/532b64add2a60f7ad7000543/test/532b6a30d2a60f067b0000c6/)
 [2014/03/20 22:22:40.214]                   536.9999 ms
 [2014/03/20 22:22:40.218]
 [2014/03/20 22:22:40.218]  *******************************************
 [2014/03/20 22:22:40.233]          Test : splitvector.js ...
 [2014/03/20 22:22:40.233]       Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode compatibility D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\splitvector.js --eval TestData = new Object();TestData.testPath = "D:\\data\\mci\\git@github.commongodb\\mongo.git\\mongodb-mongo-v2.6\\jstests\\core\\splitvector.js";TestData.testFile = "splitvector.js";TestData.testName = "splitvector";TestData.setParameters = "";TestData.setParametersMongos = "";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;TestData.authMechanism = "MONGODB-CR";TestData.useSSL = false;TestData.useX509 = false;TestData.useWriteCommands = false;MongoRunner.dataDir = "D:\\data\\db";MongoRunner.dataPath = MongoRunner.dataDir + "/";
 [2014/03/20 22:22:40.233]          Date : Thu Mar 20 22:22:40 2014
 [2014/03/20 22:22:40.384]                 (output suppressed; see http://buildlogs.mongodb.org/build/532b64add2a60f7ad7000543/test/532b6a30d2a60f0511000b1f/)
 [2014/03/20 22:23:25.617]                    45.3990 seconds
 [2014/03/20 22:23:30.618]
 [2014/03/20 22:23:31.694] removing: /data/db/sconsTests//local.0
 [2014/03/20 22:23:31.694] WindowsError(32, 'The process cannot access the file because it is being used by another process')
 [2014/03/20 22:23:31.694] os.remove(/data/db/sconsTests//local.0) failed, retrying once.
 [2014/03/20 22:23:31.700] Traceback (most recent call last):
 [2014/03/20 22:23:31.700]   File "buildscripts/cleanbb.py", line 97, in <module>
 [2014/03/20 22:23:31.700]     cleanup( root , options.nokill )
 [2014/03/20 22:23:31.700]   File "buildscripts/cleanbb.py", line 86, in cleanup
 [2014/03/20 22:23:31.700]     os.remove(foo)
 [2014/03/20 22:23:31.700] WindowsError: [Error 32] The process cannot access the file because it is being used by another process: '/data/db/sconsTests//local.0'
 [2014/03/20 22:23:32.707] running ./mongod.exe --port 27999 --dbpath /data/db/sconsTests/ --setParameter enableTestCommands=1 --httpinterface
 [2014/03/20 22:23:32.708]  *******************************************
 [2014/03/20 22:23:32.708]          Test : stages_and_hash.js ...
 [2014/03/20 22:23:32.708]       Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR --writeMode compatibility D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\stages_and_hash.js --eval TestData = new Object();TestData.testPath = "D:\\data\\mci\\git@github.commongodb\\mongo.git\\mongodb-mongo-v2.6\\jstests\\core\\stages_and_hash.js";TestData.testFile = "stages_and_hash.js";TestData.testName = "stages_and_hash";TestData.setParameters = "";TestData.setParametersMongos = "";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;TestData.authMechanism = "MONGODB-CR";TestData.useSSL = false;TestData.useX509 = false;TestData.useWriteCommands = false;MongoRunner.dataDir = "D:\\data\\db";MongoRunner.dataPath = MongoRunner.dataDir + "/";
 [2014/03/20 22:23:32.716]          Date : Thu Mar 20 22:23:32 2014
 [2014/03/20 22:23:32.879]                 (output suppressed; see http://buildlogs.mongodb.org/build/532b64add2a60f7ad7000543/test/532b6a65d2a60f0656000369/)
 [2014/03/20 22:23:35.067]                  2359.0000 ms
 [2014/03/20 22:23:41.061] [Errno 10061] No connection could be made because the target machine actively refused it
 [2014/03/20 22:23:41.061] mongod is not running after test
 [2014/03/20 22:23:41.061] mongod not running after executing test D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\stages_and_hash.js
 [2014/03/20 22:23:42.121] 659 tests succeeded
 [2014/03/20 22:23:42.121] 84 tests didn't get run
 [2014/03/20 22:23:42.121] The following tests failed (with exit code):
 [2014/03/20 22:23:42.123] [Errno 10061] No connection could be made because the target machine actively refused it
 [2014/03/20 22:23:42.123] Traceback (most recent call last):
 [2014/03/20 22:23:42.123]   File "buildscripts/smoke.py", line 1267, in <module>
 [2014/03/20 22:23:42.123]     main()
 [2014/03/20 22:23:42.123]   File "buildscripts/smoke.py", line 1264, in main
 [2014/03/20 22:23:42.123]     report()
 [2014/03/20 22:23:42.123]   File "buildscripts/smoke.py", line 807, in report
 [2014/03/20 22:23:42.123]     raise Exception("Test failures")
 [2014/03/20 22:23:42.123] Exception: Test failures
 [2014/03/20 22:23:42.132] D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\bench_test3.js	-3
 [2014/03/20 22:23:42.203] D:\data\mci\git@github.commongodb\mongo.git\mongodb-mongo-v2.6\jstests\core\stages_and_hash.js	-1

Test log excerpt:

2014-03-20T22:23:23.358+0000 [conn60] 	 added index to empty collection
2014-03-20T22:23:23.359+0000 [conn60] build index on: test.jstests_splitvector properties: { v: 1, key: { x: 1.0, y: -1.0, z: 1.0 }, name: "x_1_y_-1_z_1", ns: "test.jstests_splitvector" }
2014-03-20T22:23:23.359+0000 [conn60] 	 added index to empty collection
2014-03-20T22:23:25.409+0000 [conn60] request split points lookup for chunk test.jstests_splitvector { : MinKey, : MaxKey, : MinKey } -->> { : MaxKey, : MinKey, : MaxKey }
test.jstests_splitvector
test.jstests_splitvector
PASSED
2014-03-20T22:23:31.873+0000 
2014-03-20T22:23:31.874+0000 warning: 32-bit servers don't have journaling enabled by default. Please use --journal if you want durability.
2014-03-20T22:23:31.874+0000 
2014-03-20T22:23:31.882+0000 [initandlisten] MongoDB starting : pid=4044 port=27999 dbpath=/data/db/sconsTests 32-bit host=EC2AMAZ-ZVRDQB5
2014-03-20T22:23:31.883+0000 [initandlisten] 
2014-03-20T22:23:31.883+0000 [initandlisten] ** NOTE: This is a 32 bit MongoDB binary.
2014-03-20T22:23:31.883+0000 [initandlisten] **       32 bit builds are limited to less than 2GB of data (or less with --journal).
2014-03-20T22:23:31.883+0000 [initandlisten] **       Note that journaling defaults to off for 32 bit and is currently off.
2014-03-20T22:23:31.883+0000 [initandlisten] **       See http://dochub.mongodb.org/core/32bit
2014-03-20T22:23:31.883+0000 [initandlisten] 
2014-03-20T22:23:31.883+0000 [initandlisten] targetMinOS: Windows XP SP3
2014-03-20T22:23:31.883+0000 [initandlisten] db version v2.6.0-rc2-pre-
2014-03-20T22:23:31.883+0000 [initandlisten] git version: 8b99b7e8b7f99783e1fd76382dbda415340e95fc
2014-03-20T22:23:31.883+0000 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=0, build=6002, platform=2, service_pack='Service Pack 2') BOOST_LIB_VERSION=1_49
2014-03-20T22:23:31.884+0000 [initandlisten] allocator: system
2014-03-20T22:23:31.884+0000 [initandlisten] options: { httpinterface: true, net: { port: 27999 }, setParameter: { enableTestCommands: "1" }, storage: { dbPath: "/data/db/sconsTests/" } }
2014-03-20T22:23:31.884+0000 [initandlisten] exception in initAndListen: 13627 Unable to create/open lock file: /data/db/sconsTests\mongod.lock The process cannot access the file because it is being used by another process.. Is a mongod instance already running?, terminating
dbexit: 2014-03-20T22:23:31.884+0000 [initandlisten] shutdown: going to close listening sockets...
2014-03-20T22:23:31.885+0000 [initandlisten] shutdown: going to flush diaglog...
2014-03-20T22:23:31.885+0000 [initandlisten] shutdown: going to close sockets...
2014-03-20T22:23:31.885+0000 [initandlisten] shutdown: waiting for fs preallocator...
2014-03-20T22:23:31.885+0000 [initandlisten] shutdown: closing all files...
2014-03-20T22:23:31.885+0000 [initandlisten] closeAllFiles() finished
dbexit: really exiting now

Note from the test log that no log output is generated from the mongod process that was supposed to have shut down, and the new mongod process exits immediately after not being able to acquire a filesystem lock on mongod.lock. This leads me to suspect me that smoke.py's call to TerminateJobObject() did not properly shut down the existing mongod process.

Comment by Matt Kangas [ 08/Oct/13 ]

For posterity, note that this fix isn't always sufficient. But we have seen this failure much less often:

MCI 6069172335a0 Windows 32-bit js
http://mci.10gen.com/ui/task_log_raw/mongodb_mongo_master_windows_32_6069172335a0750b2477e3e2c31232ef9e9769f4_13_10_08_14_20_32_js_windows_32?type=task

[2013/10/08 17:18:33.523]  *******************************************
[2013/10/08 17:18:33.523]          Test : basic5.js ...
[2013/10/08 17:18:33.523]       Command : ./mongo.exe --port 27999 --authenticationMechanism MONGODB-CR D:\data\mci\git@github.commongodb\mongo.git\master\jstests\basic5.js --eval TestData = new Object();TestData.testPath = "D:\\data\\mci\\git@github.commongodb\\mongo.git\\master\\jstests\\basic5.js";TestData.testFile = "basic5.js";TestData.testName = "basic5";TestData.setParameters = "";TestData.noJournal = false;TestData.noJournalPrealloc = false;TestData.auth = false;TestData.keyFile = null;TestData.keyFileData = null;TestData.authMechanism = "MONGODB-CR";TestData.useSSL = false;TestData.useX509 = false;
[2013/10/08 17:18:33.523]          Date : Tue Oct  8 17:18:33 2013
[2013/10/08 17:18:33.831]                 (output suppressed; see http://buildlogs.mongodb.org/build/525436ded2a60f241b000350/test/52543e6ad2a60f6bb60007ab/)
[2013/10/08 17:18:33.831]                   307.9998 ms
[2013/10/08 17:18:38.835]
[2013/10/08 17:18:40.264] cwd [D:\data\mci\git@github.commongodb\mongo.git\master]
[2013/10/08 17:18:40.265] removing: /data/db/sconsTests//admin.0
[2013/10/08 17:18:40.265] WindowsError(5, 'Access is denied')
[2013/10/08 17:18:40.265] os.remove(/data/db/sconsTests//admin.0) failed, retrying once.
[2013/10/08 17:18:40.266] Traceback (most recent call last):
[2013/10/08 17:18:40.266]   File "buildscripts/cleanbb.py", line 99, in <module>
[2013/10/08 17:18:40.266]     cleanup( root , options.nokill )
[2013/10/08 17:18:40.266]   File "buildscripts/cleanbb.py", line 88, in cleanup
[2013/10/08 17:18:40.266]     os.remove(foo)
[2013/10/08 17:18:40.272] WindowsError: [Error 5] Access is denied: '/data/db/sconsTests//admin.0'

Comment by auto [ 02/Oct/13 ]

Author:

{u'username': u'kangas', u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}

Message: SERVER-10462 compensate for Windows file locking race
Branch: v2.4
https://github.com/mongodb/mongo/commit/9c032a641b278715e71129efe100db2b570ceeb0

Comment by Matt Kangas [ 11/Sep/13 ]

Hopefully this fix is sufficient

Comment by auto [ 11/Sep/13 ]

Author:

{u'name': u'Matt Kangas', u'email': u'matt.kangas@mongodb.com'}

Message: SERVER-10462 compensate for Windows file locking race
Branch: master
https://github.com/mongodb/mongo/commit/1bd77478ad81a008bf908acba22eb8e5aa1a7887

Comment by Matt Kangas [ 06/Sep/13 ]

This again: Fri Sep 6

I have a potential fix. Will submit a code review asap.

Comment by Matt Kangas [ 22/Aug/13 ]

Again:

Builder V2.4 Windows 64-bit 2008R2+ Build #107 (Wed Aug 21)

http://buildbot.10gen.cc/builders/V2.4%20Windows%2064-bit%202008R2%2B/builds/107/steps/smokeJs/logs/stdio

cwd [D:\slave\Windows_64bit_2008R2+_V2.4\mongo]
removing: /data/db/sconsTests/journal/j._0
removing: /data/db/sconsTests/_tmp/1377117912146276
Traceback (most recent call last):
  File "buildscripts/cleanbb.py", line 94, in <module>
    cleanup( root , options.nokill )
  File "buildscripts/cleanbb.py", line 82, in cleanup
    os.remove( foo )
WindowsError: [Error 32] The process cannot access the file because it is being used by another process: '/data/db/sconsTests/_tmp/1377117912146276'

Maybe we can sleep and retry if we encounter "WindowsError: [Error 32]"?

Comment by Matt Kangas [ 08/Aug/13 ]

Again: V2.4 Windows 64-bit 2008R2+ Build #94 (Aug 8)

http://buildbot.mongodb.org/builders/V2.4%20Windows%2064-bit%202008R2%2B/builds/94/steps/test_7/logs/stdio

cwd [D:\slave\Windows_64bit_2008R2+_V2.4\mongo]
removing: /data/db/sconsTests/journal/j._0
removing: /data/db/sconsTests//loadserverscripts.0
Traceback (most recent call last):
  File "buildscripts/cleanbb.py", line 94, in <module>
    cleanup( root , options.nokill )
  File "buildscripts/cleanbb.py", line 82, in cleanup
    os.remove( foo )
WindowsError: [Error 5] Access is denied: '/data/db/sconsTests//loadserverscripts.0'

Generated at Thu Feb 08 03:23:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.