[SERVER-55469] Uploading ~500000 small tables into collections interrupts with error and even breaks MongoDB Created: 24/Mar/21  Updated: 27/Oct/23  Resolved: 04/Apr/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Platon workaccount Assignee: Dmitry Agranat
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

elementary OS 5.1.7
MongoDB shell version v4.4.4
PyMongo 3.11.2


Attachments: PNG File MongoDB_ECONNREFUSED.png     PNG File src_table_example.png    
Operating System: ALL
Steps To Reproduce:

 

Participants:

 Description   

Source data.
A large number (in my case, 500409) of TSVs. The size of each is most often no more than 100 rows.

My actions.
I tried to convert each table to collection using my well-tested Python program.

Error.
After ~15 minutes of running the program an error occurs:

multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/platon/miniconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "/home/platon/miniconda3/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "create_db.py", line 211, in create_collection
    'block_compressor=zstd'}})
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/database.py", line 408, in create_collection
    with self.__client._tmp_session(session) as s:
  File "/home/platon/miniconda3/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1829, in _tmp_session
    s = self._ensure_session(session)
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1816, in _ensure_session
    return self.__start_session(True, causal_consistency=False)
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1766, in __start_session
    server_session = self._get_server_session()
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1802, in _get_server_session
    return self._topology.get_server_session()
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/topology.py", line 488, in get_server_session
    None)
  File "/home/platon/miniconda3/lib/python3.7/site-packages/pymongo/topology.py", line 217, in _select_servers_loop
    (self._error_message(selector), timeout, self.description))
pymongo.errors.ServerSelectionTimeoutError: localhost:27017: [Errno 111] Connection refused, Timeout: 30s, Topology Description: <TopologyDescription id: 605a9f0f977b5c231ff4873e, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('localhost:27017: [Errno 111] Connection refused')>]>

Global MongoDB breakdown.
All further attempts to use MongoDB result in connection errors. Restarting the Mongo process doesn't help.

mongo

MongoDB shell version v4.4.4
 connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
 Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
 connect@src/mongo/shell/mongo.js:374:17
 @(connect):2:6
 exception: connect failed
 exiting with code 1


Why do I consider this a MongoDB bug?
The documentation does not describe any limits on the number of collections from the WiredTiger side.



 Comments   
Comment by Platon workaccount [ 05/Apr/21 ]

Thanks for the quick guide. As a result, it became possible to get the right print:

cat /proc/1379/limits

Limit Soft Limit Hard Limit Units 
Max cpu time unlimited unlimited seconds 
Max file size unlimited unlimited bytes 
Max data size unlimited unlimited bytes 
Max stack size 8388608 unlimited bytes 
Max core file size 0 unlimited bytes 
Max resident set unlimited unlimited bytes 
Max processes 64000 64000 processes 
Max open files 64000 64000 files 
Max locked memory unlimited unlimited bytes 
Max address space unlimited unlimited bytes 
Max file locks unlimited unlimited locks 
Max pending signals 63162 63162 signals 
Max msgqueue size 819200 819200 bytes 
Max nice priority 0 0 
Max realtime priority 0 0 
Max realtime timeout unlimited unlimited us

I propose to note in MongoDB Limits and Thresholds, that this case does not refer to MongoDB limits. I also think it is useful to insert into UNIX ulimit Settings the instruction from comment-3700289

Comment by Dmitry Agranat [ 05/Apr/21 ]

Sure platon.work@gmail.com, first, you can grep for the mongod process:

ps -ef | grep mongod

Now that you know the mongod process id, you can get the list of all limits in the /proc file-system which stores the per-process limits:

cat /proc/pid/limits 

where 'pid' is the mongod’s process identifier you have retrieved with the grep command. For example, if mongod process id is 4741, the command would look like this:

cat /proc/4741/limits 

As the SERVER project is for bugs and feature suggestions for the MongoDB server, for general questions about MongoDB we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums.

Regards,
Dima

Comment by Platon workaccount [ 04/Apr/21 ]

Hello, @dmitry.agranat

The output you've provided does not look like to be related to the mongod process

I couldn't google a tutorial on how to output ulimit for a specific process. Can you give me a link to the docs or quote the necessary commands?

Perhaps this output is for the root user?

groups platon

platon : platon adm cdrom sudo dip plugdev lpadmin sambashare docker

Comment by Dmitry Agranat [ 04/Apr/21 ]

Hi platon.work@gmail.com,

The output you've provided does not look like to be related to the mongod process as we've already identified (via the earlier provided diagnostic.data) that your current open files are set to 64k. Perhaps this output is for the root user?

Just to reiterate the issue you've experienced, in order for the mongod process to be able to process 500k tables (collections and indexes), first, you'll need to adjust your default Unix settings.

As this is not a bug, I will go ahead and close this ticket.

Regards,
Dima

Comment by Platon workaccount [ 30/Mar/21 ]

ulimit -a

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63162
max locked memory (kbytes, -l) 65536
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 63162
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Comment by Dmitry Agranat [ 29/Mar/21 ]

Hi platon.work@gmail.com, the issue you are reporting too many open files is related to your ulimit settings. If I am not mistaken, it is currently set for your mongod process to only 64k while you are aiming to get to 500k. Please post the command and the output of ulimit -a for the mongod process.

In addition, taking into account ~30kb per data handle, you will need ~15GB of memory just for data handles, so your current 15GB server might not be sufficient.

Dima

Comment by Platon workaccount [ 24/Mar/21 ]

I uploaded the debug info. Archive name is SERVER-55469.zip

Comment by Dmitry Agranat [ 24/Mar/21 ]

Hi platon.work@gmail.com,

Would you please archive (tar or zip) the full mongod.log files covering the test and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Regards,
Dima

Generated at Thu Feb 08 05:36:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.