[SERVER-6692] GridFS:mongod crashed when save file with many processes! Created: 02/Aug/12 Updated: 08/Mar/13 Resolved: 05/Sep/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | GridFS, Internal Client |
| Affects Version/s: | 2.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | barongwang | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | crash, driver, insert | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | Linux |
| Participants: |
| Description |
|
We turn on several clients to send data to the same mongod process. When speed of writing to disc come up to aproximate 4M/S, the mongod process crashes. We could not check mongod process using top under linux. However, when we use netstat, we find that those ports mornitored by mongod are still being mornitored, and every command related to mongod is bollocked.what's more,the files using to save the collections are locked(cann't access these files!) |
| Comments |
| Comment by Spencer Brody (Inactive) [ 05/Sep/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I'm closing this ticket due to lack of activity. If you'd like to continue investigating this, please reopen the ticket and add answers to the questions from my last post. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 09/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
If you are using a ScopedDbConnection you can call done() on it when you are finished which will return the connection to the internal connection pool so that it can be reused on a future request. The problem with setChunkSize is a known bug, Is the tail of the log you attached the tail from the very end of your test run, after the mongod crashed? That log just seems to end... there's no error message, stack track, or shutdown messages in the logs? How can you tell the server is crashed? What behavior are you seeing when you try to connect to the server after the crash? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by barongwang [ 09/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It seems that the c++ dirver didn't supports to change the chunksize of gridfs,as there have a assert in "setChunkSize(unsigned int size)" need size equal to zero. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by barongwang [ 09/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It is running on a single mongod! these days i use c++ driver to save files which smaller than 16k into gridfs, mongod crashed frequently. The amount of clients is 200. //when i use mongo to connect to mongod, the process blocked. when i try to connect to mongod with c++ dirver, mongod refused. //tail -50 of log c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17c5'), files_id: ObjectId('5023320640bbf4b33d3dc70a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17df'), files_id: ObjectId('5023320640bbf4b33d3dc70a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17e0'), files_id: ObjectId('5023320640bbf4b33d3dc70a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17e3'), files_id: ObjectId('5023320640bbf4b33d3dc70a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17fc'), files_id: ObjectId('5023320640bbf4b33d3dc70a'), n: 0, data: BinData }Thu Aug 9 11:44:06 [conn35291] should have chunk: 1 have:0 c->nextSafe(): { _id: ObjectId('502331ffba73d3605cfa1616'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('502331ffba73d3605cfa1661'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('502331ffba73d3605cfa1683'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa16e0'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa1742'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa176a'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17b0'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17fe'), files_id: ObjectId('502331ff40bbf4b33d3dc722'), n: 0, data: BinData }Thu Aug 9 11:44:06 [conn35298] end connection 10.6.11.104:50831 c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa173f'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa1757'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa177f'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17b7'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17c8'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17cf'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17e2'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa1800'), files_id: ObjectId('5023320640bbf4b33d3dc71f'), n: 0, data: BinData }Thu Aug 9 11:44:06 [conn35448] should have chunk: 1 have:0 c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa177d'), files_id: ObjectId('5023320640bbf4b33d3dc70b'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17b8'), files_id: ObjectId('5023320640bbf4b33d3dc70b'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17bc'), files_id: ObjectId('5023320640bbf4b33d3dc70b'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17c2'), files_id: ObjectId('5023320640bbf4b33d3dc70b'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa1801'), files_id: ObjectId('5023320640bbf4b33d3dc70b'), n: 0, data: BinData }Thu Aug 9 11:44:06 [conn35258] should have chunk: 1 have:0 c->nextSafe(): { _id: ObjectId('502331ffba73d3605cfa1663'), files_id: ObjectId('502331ff40bbf4b33d3dc72a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('502331ffba73d3605cfa1687'), files_id: ObjectId('502331ff40bbf4b33d3dc72a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa1724'), files_id: ObjectId('502331ff40bbf4b33d3dc72a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa179e'), files_id: ObjectId('502331ff40bbf4b33d3dc72a'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa17ff'), files_id: ObjectId('502331ff40bbf4b33d3dc72a'), n: 0, data: BinData }Thu Aug 9 11:44:06 [conn35274] should have chunk: 1 have:0 c->nextSafe(): { _id: ObjectId('50233206ba73d3605cfa1802'), files_id: ObjectId('5023320640bbf4b33d3dc724'), n: 0, data: BinData }Thu Aug 9 11:44:07 [initandlisten] connection accepted from 10.6.11.104:51057 #35518 //head 40 of log Wed Aug 8 19:54:48 [initandlisten] journal dir=/data/aoi/journal c->nextSafe(): { _id: ObjectId('5022543dba73d3605ceb0d79'), files_id: ObjectId('5022543d6a8b885752a74c16'), n: 0, data: BinData }Wed Aug 8 19:57:49 [conn3] end connection 10.6.11.104:37507 c->nextSafe(): { _id: ObjectId('5022543dba73d3605ceb0d7a'), files_id: ObjectId('5022543d6a8b885752a74c16'), n: 0, data: BinData }c->nextSafe(): { _id: ObjectId('5022543dba73d3605ceb0d79'), files_id: ObjectId('5022543d6a8b885752a74c16'), n: 0, data: BinData }Wed Aug 8 19:57:49 [conn4] should have chunk: 1 have:0 In addition, i want to know if C++ dirver have any method to disconnect the connections . | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 03/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you post the mongod logs from a full run of a test that triggers the crash? Is this running on a single mongod, a replica set, or a sharded cluster? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by barongwang [ 03/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by barongwang [ 03/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The version which i used is mongodb-linux-x86_64-static-legacy-2.0.6 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Spencer Brody (Inactive) [ 02/Aug/12 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Do you have a reproducible test case you could attach? Can you check if this problem still exists in 2.0.6? There were many stability enhancements to mongod between 2.0.0 and 2.0.6. |