[SERVER-13571] reproduceable mongodump crash on special collection Created: 14/Apr/14  Updated: 10/Dec/14  Resolved: 25/Apr/14

Status: Closed
Project: Core Server
Component/s: Tools
Affects Version/s: 2.4.8
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Max Quatember Assignee: J Rassi
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log     Text File mongod2.6.0.log     Text File mongodump.log     Text File mongodump2.6.0.log    
Operating System: Windows
Participants:

 Description   

Hi Everyone!

I have a reproduceable crash with latest mongodump on windows 8.1.
One special collection cannot be dumped.
The database is 12GB (Zipped 300MB, could be provided for debugging).
This Ticket is created because of following discussion:
https://groups.google.com/forum/#!topic/mongodb-dev/qa3PM5NWWGA

Behaviour:
If I start an online backup - host+port specified - with mongodump, it crashes on one special collection of my database.
If I start it with dbpath specified, everything works.

  • Version of mongod: 2.4.8 / Version of mongodump: 2.4.8
  • Reproduceable also with only the "Day" collection
  • repair database has no effect
  • validating the collection is ok
  • biggest BSON obj in collection "Day": 11093 bytes

I attached the logfiles of mongod+mongodump, both started with "-vvv".

Thanks for the help!

BR,
Max



 Comments   
Comment by J Rassi [ 25/Apr/14 ]

I'm resolving this ticket as "Can't reproduce". Tentative diagnosis is that a networking misconfiguration is causing the OS to close the server's connection to the client, and that this behavior manifests when the socket buffer is filled up on the sender side by the "exhaust cursor" feature; further diagnosis is out of scope of this ticket (for further research, note that a Windows network trace may reveal the system event that triggers the connection teardown – see netsh trace start help).

Please re-open this ticket if you encounter the problem on a different machine.

Comment by Max Quatember [ 18/Apr/14 ]

Hi Jason!

Your python script said following:

Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pymongo; print pymongo.version; coll = pymongo.MongoClient("127.0.0.1:9000")["db"]["Day"]; getcursor = lambda : coll.find(snapshot=True, exhaust=True); filter(lambda _ : False, getcursor())
; print getcursor().count()
2.7
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 1038, in next
    if len(self.__data) or self._refresh():
  File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 995, in _refresh
    self.__send_message(None)
  File "C:\Python27\lib\site-packages\pymongo\cursor.py", line 918, in __send_message
    response = client._exhaust_next(self.__exhaust_mgr.sock)
  File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 1209, in _exhaust_next
    return self.__receive_message_on_socket(1, None, sock_info)
  File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 1161, in __receive_message_on_socket
    return self.__receive_data_on_socket(length - 16, sock_info)
  File "C:\Python27\lib\site-packages\pymongo\mongo_client.py", line 1140, in __receive_data_on_socket
    chunk = sock_info.sock.recv(length)
socket.error: [Errno 10054] Eine vorhandene Verbindung wurde vom Remotehost geschlossen

I now tested the same dump on an other machine that is similar (Hardware, OS, Config) to mine. There everything worked.
So the assumption that it is a networking misconfiguration is near!
If the problem reoccurs in a production environment I will open a new discussion...
If you want me to investigate further I will help, but If you want to close the issue it's also fine for me!

Thank you for your quick help, I hope I was not too annoying

BR,
Max

Comment by J Rassi [ 17/Apr/14 ]

My mistake, I did not realize that the shell does not support the "exhaust" query option. PyMongo (the MongoDB driver for Python) does support this query option; do you have Python installed on this machine? If so, could you provide the output of running the following line at the Python prompt? It runs the PyMongo equivalent of the shell snippet I sent above. See the PyMongo homepage for installation instructions (MS installer direct download link here, assuming you are using Python 2.7).

>>> import pymongo; print pymongo.version; coll = pymongo.MongoClient("127.0.0.1:9000")["db"]["Day"]; getcursor = lambda : coll.find(snapshot=True, exhaust=True); filter(lambda _ : False, getcursor()); print getcursor().count()

I suspect that the root cause of the issue is a networking misconfiguration on the host machine, evidenced by the following snippet from the 2.6.0 mongod log. The server received socket error WSAECONNABORTED (errno 10053) when trying to write to the client over the network connection.

2014-04-14T17:09:09.787+0200 [conn1] Socket say send() errno:10053 Eine bestehende Verbindung wurde softwaregesteuert 127.0.0.1:64767

Comment by Max Quatember [ 14/Apr/14 ]

Hi Jason!

Problem is reproduceable with the "Day" collection only. With 2.4.8 and 2.6.0.

The result of your query:

MongoDB shell version: 2.6.0
connecting to: 127.0.0.1:9000/test
> db = connect( "localhost:9000/db" )
connecting to: localhost:9000/db
db
> db.getSiblingDB("db").Day.find().addOption(64).snapshot().itcount()
2014-04-14T17:43:25.630+0200 ERROR: MessagingPort::call() wrong id got:1ad expect:8
  toSend op: 2005
  response msgid:430
  response len:  4194576
  response op:  1
  remote: 127.0.0.1:9000
2014-04-14T17:43:25.633+0200 Assertion failure false src\mongo\util\net\message_port.cpp 269
2014-04-14T17:43:25.741+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x1d1463
2014-04-14T17:43:25.741+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x18172c
2014-04-14T17:43:25.742+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x16d1ca
2014-04-14T17:43:25.742+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x18b8af
2014-04-14T17:43:25.743+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x472de
2014-04-14T17:43:25.743+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x55fe1
2014-04-14T17:43:25.744+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x562e5
2014-04-14T17:43:25.744+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x14074e
2014-04-14T17:43:25.744+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x136d59
2014-04-14T17:43:25.745+0200              ???
2014-04-14T17:43:25.745+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x4e42ef
2014-04-14T17:43:25.746+0200              ???
2014-04-14T17:43:25.746+0200              ???
2014-04-14T17:43:25.746+0200              ???
2014-04-14T17:43:25.746+0200              ???
2014-04-14T17:43:25.747+0200
2014-04-14T17:43:25.750+0200 Error: assertion src\mongo\util\net\message_port.cpp:269 at src/mongo/shell/query.js:116
2014-04-14T17:43:25.780+0200 ERROR: MessagingPort::call() wrong id got:1ae expect:9
  toSend op: 2004
  response msgid:431
  response len:  4194342
  response op:  1
  remote: 127.0.0.1:9000
2014-04-14T17:43:25.780+0200 Assertion failure false src\mongo\util\net\message_port.cpp 269
2014-04-14T17:43:25.881+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x1d1463
2014-04-14T17:43:25.882+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x18172c
2014-04-14T17:43:25.882+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x16d1ca
2014-04-14T17:43:25.883+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x18b8af
2014-04-14T17:43:25.883+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x472de
2014-04-14T17:43:25.884+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x55696
2014-04-14T17:43:25.884+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x3af78
2014-04-14T17:43:25.885+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x47524
2014-04-14T17:43:25.885+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x1440ed
2014-04-14T17:43:25.886+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x136d59
2014-04-14T17:43:25.886+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x2ffd26
2014-04-14T17:43:25.887+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x300e26
2014-04-14T17:43:25.887+0200              ???
2014-04-14T17:43:25.887+0200 mongo.exe    mongo::parseNumberFromStringWithBase<unsigned char>+0x3cf2ff
2014-04-14T17:43:25.888+0200              ???
2014-04-14T17:43:25.888+0200              ???
2014-04-14T17:43:25.888+0200              ???
2014-04-14T17:43:25.888+0200              ???
2014-04-14T17:43:25.889+0200              ???
2014-04-14T17:43:25.889+0200              ???
2014-04-14T17:43:25.889+0200

BR,
Max

Comment by J Rassi [ 14/Apr/14 ]

And, I have one more query to ask you to test running at the mongo shell, if you would (adds the "Exhaust cursor" query option, which mongodump also uses):

db.getSiblingDB("db").Day.find().addOption(64).snapshot().itcount()

Comment by J Rassi [ 14/Apr/14 ]

Reproduceable also with only the "Day" collection

Can you confirm that you used options "-d db -c Day" during this trial mongodump run, and that it was unsuccessful?

Comment by Max Quatember [ 14/Apr/14 ]

Hi Jason!

Output of the mongoshell:

MongoDB shell version: 2.4.8
connecting to: localhost:9000/db
> db.getSiblingDB("db").Day.count()
3791036
> db.getSiblingDB("db").Day.find().itcount()
3791036
> db.getSiblingDB("db").Day.find().snapshot().itcount()
3791036

The logfiles will follow soon...

BR,
Max

Comment by J Rassi [ 14/Apr/14 ]

Could you paste the output of running the following at the mongo shell? The last line emulates the query that mongodump is running.

db.getSiblingDB("db").Day.count()
db.getSiblingDB("db").Day.find().itcount()
db.getSiblingDB("db").Day.find().snapshot().itcount()

In addition, would you be willing to download MongoDB 2.6.0 (available here) and upload the new mongodump and mongod logs (both with log level 3) generated when you attempt to dump the same collection? MongoDB 2.6.0 outputs additional debugging information at this log level.

Thanks.

Comment by Max Quatember [ 14/Apr/14 ]

Hi Matt!

I'm ready! What do I have to do?

BR,
Max

Generated at Thu Feb 08 03:32:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.