[SERVER-9691] Clean command crashes mongod on windows - unhandled exception Created: 15/May/13  Updated: 08/Feb/18  Resolved: 04/Jun/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.2.4, 2.4.4
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Gregor Macadam Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 R2 SP1


Issue Links:
Duplicate
duplicates SERVER-9699 Remove clean command Closed
Operating System: ALL
Steps To Reproduce:

> db.test.clean()
Wed May 15 09:07:57.715 trying reconnect to 127.0.0.1:27017
Wed May 15 09:07:57.730 reconnect 127.0.0.1:27017 ok
Wed May 15 09:07:58.574 Socket recv() errno:10054 An existing connection was for
cibly closed by the remote host. 127.0.0.1:27017
Wed May 15 09:07:58.574 SocketException: remote: 127.0.0.1:27017 error: 9001 soc
ket exception [1] server [127.0.0.1:27017]
Wed May 15 09:07:58.574 DBClientCursor::init call() failed
Wed May 15 09:07:58.574 JavaScript execution failed: Error: error doing query: f
ailed at src/mongo/shell/query.js:L78
>

Participants:

 Description   

Wed May 15 09:07:01.497 [initandlisten] MongoDB starting : pid=2872 port=27017 dbpath=.\data 64-bit host=gregortest
Wed May 15 09:07:01.497 [initandlisten] db version v2.4.4-pre-
Wed May 15 09:07:01.497 [initandlisten] git version: f4ddf51ef4c4712cb3d4e6b5b0a70dc1a12d7ae9
Wed May 15 09:07:01.497 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
Wed May 15 09:07:01.497 [initandlisten] allocator: system
Wed May 15 09:07:01.497 [initandlisten] options: { dbpath: ".\data" }
Wed May 15 09:07:01.528 [initandlisten] journal dir=.\data\journal
Wed May 15 09:07:01.528 [initandlisten] recover begin
Wed May 15 09:07:01.528 [initandlisten] recover lsn: 0
Wed May 15 09:07:01.528 [initandlisten] recover .\data\journal\j._0
Wed May 15 09:07:01.606 [initandlisten] recover cleaning up
Wed May 15 09:07:01.606 [initandlisten] removeJournalFiles
Wed May 15 09:07:01.606 [initandlisten] recover done
Wed May 15 09:07:01.606 [initandlisten] preallocating a journal file .\data\journal\prealloc.0
Wed May 15 09:07:30.746 [websvr] admin web console waiting for connections on port 28017
Wed May 15 09:07:30.746 [initandlisten] waiting for connections on port 27017
Wed May 15 09:07:57.715 [initandlisten] connection accepted from 127.0.0.1:49171 #1 (1 connection now open)
Wed May 15 09:07:57.730 [conn1] CMD: clean test.test
Wed May 15 09:07:57.730 [conn1] *** unhandled exception (access violation) at 0x00000001401289A0, terminating
Wed May 15 09:07:57.730 [conn1] *** access violation was a write to 0x00000040795068C0
Wed May 15 09:07:57.730 [conn1] *** stack trace for unhandled exception:
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\dbcommands_admin.cpp(84)                        mongo::CleanCmd::run+0x250
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\dbcommands.cpp(1908)                            mongo::_execCommand+0x5e
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\dbcommands.cpp(2048)                            mongo::Command::execCommand+0xb84
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\dbcommands.cpp(2109)                            mongo::_runCommands+0x3fd
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\ops\query.cpp(46)                               mongo::runCommands+0x46
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\ops\query.cpp(952)                              mongo::runQuery+0x43f
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\instance.cpp(262)                               mongo::receivedQuery+0x35d
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\instance.cpp(401)                               mongo::assembleResponse+0x2f0
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\db\db.cpp(194)                                     mongo::MyMessageHandler::process+0xfa
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\mongo\util\net\message_server_port.cpp(207)              mongo::PortMessageServer::handleIncomingMsg+0x56a
Wed May 15 09:07:58.511 [conn1] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)  boost::`anonymous namespace'::thread_start_function+0x21
Wed May 15 09:07:58.511 [conn1] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
Wed May 15 09:07:58.511 [conn1] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
Wed May 15 09:07:58.511 [conn1] kernel32.dll                                                                   BaseThreadInitThunk+0xd
Wed May 15 09:07:58.511 [conn1] writing minidump diagnostic file mongo.dmp
Wed May 15 09:07:58.558 [conn1] *** immediate exit due to unhandled exception



 Comments   
Comment by Tad Marshall [ 15/May/13 ]

We get no segfault in Ubuntu because in Linux the private view is implicitly copy-on-write. Only on Windows do we change page protections dynamically in the private view, because otherwise Windows commits page file space for the entire private view, even though much of it may not ever be written.

$ pmap 3189
3189:   ./mongod
0000000000400000  24640K r-x--  /home/tad/mongo/mongodev/mongod
0000000001e0f000    436K r----  /home/tad/mongo/mongodev/mongod
0000000001e7c000     52K rw---  /home/tad/mongo/mongodev/mongod
0000000001e89000   4100K rw---    [ anon ]
00007f27b4488000  32768K rw---    [ anon ]
00007f27d6489000 131072K rw---  /data/db/test.1
00007f27de489000 131072K rw-s-  /data/db/test.1
00007f27e6489000  65536K rw---  /data/db/test.0
00007f27ea489000  65536K rw-s-  /data/db/test.0
00007f27ee489000  16384K rw---  /data/db/test.ns
00007f27ef489000  16384K rw-s-  /data/db/test.ns
// ... snip ...

Both the shared view (with "s") and the private view (without "s") are read/write on Linux.

Comment by Tad Marshall [ 15/May/13 ]

I reproduced this in the current master code.

The code is writing to the deletedList without making the memory writable (using write intent). This would be bad even if it didn't crash, because the changes will not be journaled and so the database would be corrupt if the server died with a hard crash.

src/mongo/db/dbcommands_admin.cpp lines 86 to 87:

            for ( int i = 0; i < Buckets; i++ )
                d->deletedList[i].Null();

The code should be:

            for ( int i = 0; i < Buckets; i++ )
                d->deletedList[i].writing().Null();
 

The code has been this way since November of 2009, and it looks like the code before that date had the same bug; this may have always been this way.

The mystery is why this doesn't crash on other operating systems. I tested in Ubuntu and did not get a crash.

Comment by Gregor Macadam [ 15/May/13 ]

Also on 2.2.4

Wed May 15 09:17:24 [initandlisten] MongoDB starting : pid=180 port=27017 dbpath=.\data 64-bit host=gregortest
Wed May 15 09:17:24 [initandlisten] db version v2.2.4, pdfile version 4.5
Wed May 15 09:17:24 [initandlisten] git version: a2880030d3cec10c850c6850efa7a51b628b8822
Wed May 15 09:17:24 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_49
Wed May 15 09:17:24 [initandlisten] options: { dbpath: ".\data" }
Wed May 15 09:17:24 [initandlisten] journal dir=./data/journal
Wed May 15 09:17:24 [initandlisten] recover begin
Wed May 15 09:17:24 [initandlisten] recover lsn: 31890
Wed May 15 09:17:24 [initandlisten] recover ./data/journal/j._0
Wed May 15 09:17:24 [initandlisten] recover create file ./data/test.ns 16MB
Wed May 15 09:17:25 [initandlisten] recover create file ./data/test.0 64MB
Wed May 15 09:17:29 [initandlisten] recover cleaning up
Wed May 15 09:17:29 [initandlisten] removeJournalFiles
Wed May 15 09:17:29 [initandlisten] recover done
Wed May 15 09:17:29 [initandlisten] preallocating a journal file ./data/journal/prealloc.0
Wed May 15 09:18:29 [websvr] admin web console waiting for connections on port 28017
Wed May 15 09:18:29 [initandlisten] waiting for connections on port 27017
Wed May 15 09:21:12 [initandlisten] connection accepted from 127.0.0.1:49180 #1 (1 connection now open)
Wed May 15 09:21:12 [conn1] CMD: clean test.test
Wed May 15 09:21:12 [conn1] *** unhandled exception (access violation) at 0x000000013F768AC0, terminating
Wed May 15 09:21:12 [conn1] *** access violation was a write to 0x00000040555068C0
Wed May 15 09:21:12 [conn1] *** stack trace for unhandled exception:
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\dbcommands_admin.cpp(68)                        mongo::CleanCmd::run+0x180
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\dbcommands.cpp(1859)                            mongo::_execCommand+0x6a
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\dbcommands.cpp(2016)                            mongo::execCommand+0xc17
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\dbcommands.cpp(2078)                            mongo::_runCommands+0x3d7
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\ops\query.cpp(43)                               mongo::runCommands+0x46
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\ops\query.cpp(920)                              mongo::runQuery+0x41a
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\instance.cpp(244)                               mongo::receivedQuery+0x16d
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\instance.cpp(390)                               mongo::assembleResponse+0x2f7
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\db\db.cpp(193)                                     mongo::MyMessageHandler::process+0xf5
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\mongo\util\net\message_server_port.cpp(86)               mongo::pms::threadRun+0x59a
Wed May 15 09:21:13 [conn1] mongod.exe    ...\src\third_party\boost\libs\thread\src\win32\thread.cpp(180)  boost::`anonymous namespace'::thread_start_function+0x21
Wed May 15 09:21:13 [conn1] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(314)      _callthreadstartex+0x17
Wed May 15 09:21:13 [conn1] mongod.exe    f:\dd\vctools\crt_bld\self_64_amd64\crt\src\threadex.c(292)      _threadstartex+0x7f
Wed May 15 09:21:13 [conn1] kernel32.dll                                                                   BaseThreadInitThunk+0xd
Wed May 15 09:21:13 [conn1] writing minidump diagnostic file mongo.dmp
Wed May 15 09:21:13 [conn1] *** immediate exit due to unhandled exception

Generated at Thu Feb 08 03:21:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.