|
Hi,
I'm porting mongoc/libbson on AIX (v6.1). First port.
I've started with version 1.9.3. I've just moved to version 1.9.4 .
The "make check" generates crashes (core) for libbson and mongoc, both with -O2 and -g, both in 32bit and 64bit (however, crashes are different).
"make check" makes use of the "--no-fork" option. When not using this option, mongoc shows only 6 FAILures, no crash/core, and libbson is OK.
I've tried with libbson bundled or not inside mongoc: nearly the same crashes.
libbson/mongoc are compiled with GCC, v6.3 at first, now v7.1 .
Since MongoDB does not run on AIX, I'm using a remote mongodb server on a Linux/x86_64 machine (4.13.11-100.fc25.x86_64).
MongoDB version : mongodb-server-3.2.12-1.fc25.x86_64
It seems to me that the crashes deal with how libbson manages memory:
bson_mem_restore_vtable
bson_malloc0
_bson_iter_next_internal
bson_string_new
I know nearly nothing about mongoc. I need help for figuring out how to debug/fix this issue.
About mongoc, I have:
{{libmongoc 1.9.4 was configured with the following options:
Build configuration:
Enable debugging (slow) : yes
Compile with debug symbols (slow) : full
Enable GCC build optimization : no
Enable automatic init and cleanup : no
Enable mongodb+srv URIs : auto
Enable maintainer flags : no
Code coverage support : no
Cross Compiling : no
Fast counters : no
Shared memory performance counters : yes
SASL : no
SSL : openssl
Snappy Compression : system
Zlib Compression : system
Libbson : system
Documentation:
man : no
HTML : no}}
- gdb ./test-libmongoc
(gdb) run (no --no-fork option !)
{ "status": "FAIL", "test_file": "/BulkOperation/upsert_unordered_oversized", "seed": "10627911", "start": 10987519.621764589, "end": 10987521.316436900, "elapsed": 1.694672311 }
,
{ "status": "FAIL", "test_file": "/BulkOperation/upsert_huge", "seed": "3030471911", "start": 10987522.620592515, "end": 10987523.967977921, "elapsed": 1.347385406 }
,
{ "status": "FAIL", "test_file": "/BulkOperation/large_inserts_ordered", "seed": "3138342737", "start": 10987531.721175652, "end": 10987533.140875839, "elapsed": 1.419700187 }
,
{ "status": "FAIL", "test_file": "/BulkOperation/large_inserts_unordered", "seed": "3327859616", "start": 10987533.141157775, "end": 10987534.537897308, "elapsed": 1.396739533 }
,
{ "status": "FAIL", "test_file": "/Collection/update/oversize", "seed": "3143947750", "start": 10987579.365100689, "end": 10987581.035526324, "elapsed": 1.670425635 }
,
{ "status": "FAIL", "test_file": "/TOPOLOGY/scanner_connection_error", "seed": "1693182082", "start": 10987742.630362888, "end": 10987742.752703433, "elapsed": 0.122340545 }
,
(gdb) run --no-fork
...
{ "status": "PASS", "test_file": "/BulkOperation/upsert_ordered", "seed": "2340171206", "start": 10987861.810715472, "end": 10987861.863608857, "elapsed": 0.052893385 }
,
{ "status": "PASS", "test_file": "/BulkOperation/upsert_unordered", "seed": "4266335484", "start": 10987861.863814916, "end": 10987861.915511878, "elapsed": 0.051696962 }
,
0x09000000004dce94 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
(gdb) where
#0 0x09000000004dce94 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
#1 0x09000000004dc6cc in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5_64.o)
#2 0x090000000002c550 in raise () from /usr/lib/libc.a(shr_64.o)
#3 0x0900000000082fe8 in abort () from /usr/lib/libc.a(shr_64.o)
#4 0x0900000001ab18d4 in bson_mem_restore_vtable () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#5 0x0900000001ab1944 in memset () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#6 0x0900000001aa7424 in _bson_data () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#7 0x0900000001aa74f4 in _bson_encode_length () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#8 0x0900000001aa772c in _bson_append () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#9 0x0900000001aa79d4 in _bson_append_bson_begin () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#10 0x0900000001aaad04 in bson_append_timestamp () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#11 0x0900000001ac6148 in jsonsl_enable_all_callbacks () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#12 0x0900000001aca8c8 in _bson_json_read_append_dbpointer () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#13 0x0900000001acd59c in bson_json_reader_new_from_fd () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#14 0x0900000001acadb0 in _bad_extended_json () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#15 0x0900000001acb580 in _bson_json_read_end_map () from /opt/freeware/lib/libbson-1.0.a(libbson-1.0.so.0)
#16 0x0000000100080f64 in tmp_bson (json=0xffffffffffff080 "") at tests/test-conveniences.c:94
#17 0x00000001000bae70 in test_upsert_unordered_oversized (ctx=0x8) at tests/test-mongoc-bulk.c:585
#18 0x00000001000855a0 in TestSuite_RunTest (suite=<incomplete type>, test=0x11007ed10, count=0xffffffffffff54c) at tests/TestSuite.c:686
#19 0x0000000100085f1c in TestSuite_RunSerial (suite=<incomplete type>) at tests/TestSuite.c:951
#20 0x0000000100086420 in TestSuite_Run (suite=<incomplete type>) at tests/TestSuite.c:1046
#21 0x000000010000562c in main (argc=2, argv=0xffffffffffff740) at tests/test-libmongoc.c:2179
libbson bundled within mongoc :
- gmake check ---->> ./test-libbson --threads --no-fork
...
{ "status": "PASS", "test_file": "/version", "seed": "254312888", "start": 10984479.058313609, "end": 10984479.058315300, "elapsed": 0.000001691 }
,
{ "status": "PASS", "test_file": "/bson/writer/custom_realloc", "seed": "2005447092", "start": 10984479.058322513, "end": 10984479.058329431, "elapsed": 0.000006918 }
,
/bin/sh[2]: 24248364 Memory fault(coredump)
- cd ./src/libbson
- gdb ./.libs/lt-test-libbson core
Core was generated by `lt-test-libbson'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x090000000001057c in calloc_common () from /usr/lib/libc.a(shr_64.o)
(gdb) where
#0 0x090000000001057c in calloc_common () from /usr/lib/libc.a(shr_64.o)
#1 0x09000000019a5bac in bson_string_new () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#2 0x09000000019a7958 in bson_utf8_escape_for_json () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#3 0x090000000199cc4c in _bson_as_json_visit_before () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#4 0x09000000019a3f4c in bson_iter_visit_all () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#5 0x090000000199d290 in _bson_as_json_visit_document () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
...
#404 0x09000000019a4220 in bson_iter_visit_all () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#405 0x090000000199d680 in _bson_as_json_visit_all () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#406 0x090000000199d818 in bson_as_json () from /opt/freeware/src/packages/BUILD/mongo-c-driver-1.9.4/64bit/src/libbson/.libs/libbson-1.0.a(libbson-1.0.so.0)
#407 0x000000010002dfa8 in test_bson_as_json_stack_overflow () at tests/test-json.c:671
#408 0x00000001000010cc in TestSuite_AddHelper (cb_=0x110001f20 <ccl3836H.rw+4528>) at tests/TestSuite.c:304
#409 0x00000001000017d0 in TestSuite_RunTest (suite=0xfffffffffffeeb8, test=0x110038770, mutex=0xfffffffffffed38, count=0xfffffffffffed78) at tests/TestSuite.c:481
#410 0x0000000100001df4 in TestSuite_ParallelWorker (data=0x110057d10) at tests/TestSuite.c:675
#411 0x09000000004c4d34 in _pthread_body () from /usr/lib/libpthread.a(shr_xpg5_64.o)
#412 0x0000000000000000 in ?? ()
(gdb)
|
|
Author:
{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}
Message: CDRIVER-2608 proper int64_t format in cursor test
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/12f957adcdb40ef6ff0460307a174ab94ee80143
|
|
Author:
{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}
Message: CDRIVER-2608 get CPU count on AIX
Patch by Tony Reix.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/d52b3d380c31dcea195665af2d59a45a6281ccba
|
|
Author:
{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}
Message: CDRIVER-2608 print errno if malloc fails
Patch by Tony Reix.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/e1dd64038c886028a0aa0abea6f2e908f2c74c9d
|
|
I have understood the failures/cores in libbson tests on AIX.
libbson tests are launched as:
- $BUILD/libbson-1.9.4/64bit/.libs/lt-test-libbson --threads --no-fork
which implies the use of threads.
When running all tests, there is a deep recursivity within the 3 routines:
- _bson_iter_validate_document()
- unnamed block in bson_iter_visit_all()
- bson_iter_visit_all()
which is greater than 39 times and which consumes the stack of threads and generates a weird and random core on AIX.
This is due to the fact that thread stack on AIX is smaller than on Linux.
This can be fixed by using:
export AIXTHREAD_STK=500000
before "make check".
|
|
Hi Jesse,
Thanks. I'll see how to submit the patch as a GitHub pull request.
About the SKIP tests, there is code inside mongoc that SKIPs tests when in 32-bit:
tests/TestSuite.c :
+322 TestSuite_CheckMockServerAllowed (void)
+323 {
+324 if (test_framework_getenv_bool ("MONGOC_TEST_SKIP_MOCK"))
{
+325 return 0;
+326 }
+327
+328 /* CDRIVER-2115: don't run mock server tests on 32-bit */
+329 if (sizeof (void *) * 8 >= 64)
{
+330 return 1;
+331 }
else
{
+332 return 0;
+333 }
+334 }
So, that's clear.
Now, I'm studying crashes/failure in libbson tests:
.libs/lt-test-libbson --threads
{ "status": "FAIL", "test_file": "/bson/build_child_deep", "seed": "3410243516", "start": 341376.489854771, "end": 341376.665947314, "elapsed": 0.176092543 }
,
{ "status": "FAIL", "test_file": "/bson/build_child_deep_no_begin_end", "seed": "826262263", "start": 341376.780130335, "end": 341376.830185728, "elapsed": 0.050055393 }
,
{ "status": "FAIL", "test_file": "/bson/as_json/stack_overflow", "seed": "3556647355", "start": 341380.684710419, "end": 341380.851321964, "elapsed": 0.166611545 }
,
{ "status": "FAIL", "test_file": "/bson/validate", "seed": "2735287847", "start": 341384.921330023, "end": 341384.958589664, "elapsed": 0.037259641 }
,
They do not appear on Linux/PPC64LE nor Linux/x86_64. Only on AIX. Investigating.
|
|
I think your patch looks good to me, thanks. Would you mind submitting it as a GitHub pull request?
Regarding all the skipped tests, you can search for those test names and see the skip criteria for each. For example, /Topology/compatible_null_error_pointer is skipped if the environment variable MONGOC_TEST_SKIP_SLOW is set to "on".
|
|
OK. The FAILure " test_topology_scanner_connection_error()" is due to AIX generating a different error message. Fixed. Now, it passes.
So, results on AIX are:
PASS : 575
SKIP : 343 Why so many ?!!
FAIL : 0
On Linux/PPC64LE with mongoc/libbson 1.9.4, I have:
PASS : 820
SKIP : 101
FAIL : 0
So, there are 242 more SKIPped tests on AIX compared to Linux/Power. Why so many ?
Examples of tests skipped on AIX and not on Linux/PPC64LE :
/Async/ismaster
/Async/ismaster_ssl
/change_stream/pipeline
/change_stream/resumable_error
/change_stream/nonresumable_error
/change_stream/options
/Client/command_w_server_id
/Client/command_w_server_id/sharded
/Client/unavailable_seeds
/Client/rs_seeds_no_connect/single
/Client/rs_seeds_no_connect/pooled
/Client/mongos_seeds_no_connect/single
/Client/database_names
/Client/mismatched_me
/Client/handshake/pool
/Client/appname_single_uri
/Client/wire_version
/Client/ssl_opts/copies_single
/Client/select_server/retry/succeed
/Client/fetch_stream/retry/succeed
/Client/max_staleness/mongos
/BulkOperation/error
/BulkOperation/error/unordered
/BulkOperation/write_concern/write_command/ordered
/BulkOperation/wtimeout_duplicate_key/write_commands
/BulkOperation/hint/single/command/secondary
/BulkOperation/opts/collation/w0/wire5
/Cluster/cluster_time/query/
/Cluster/cluster_time/dollar_query/from_user
/Cluster/command/timeout/single
/Cluster/cluster_time/comparison/single
/Cluster/not_master/single/op_query
/Cluster/ismaster_fails
/Collection/insert/keys
/Collection/index/collation/wire4
/Collection/count_with_opts
/Collection/aggregate/inherit/collection
/Collection/stats/read_pref
/Collection/find_read_concern
/Collection/limit
/Collection/batch_size
/Collection/find_indexes/error
/Collection/getmore/batch_size
/Collection/find_with_opts/dollar_or
/command_monitoring/get_error
/Cursor/kill/single
/Cursor/client_kill_cursor/with_primary
/Cursor/new_tailable_await
/Cursor/new_ignores_fields
/Cursor/hint/single/secondary
/Cursor/n_return/find_cmd
/Database/command/read_prefs/simple/single
/Database/get_collection/getmore_cmd
/Error/command/default
/Error/command/v1
/Error/command/v2
/Client/exhaust_cursor/err/network/1st_batch/single
/find_and_modify/find_and_modify/bypass/true
/find_and_modify/opts
/find_and_modify/collation/ok
/GridFS/find_one_with_opts/limit
/MongoDB/handshake/success
/ReadPrefs/standalone/null
/ReadPrefs/rsprimary/primary
/ReadPrefs/rssecondary/secondary
/ReadPrefs/mongos/null
/ReadPrefs/OP_MSG/secondary
/retryable_writes/failover
/TOPOLOGY/scanner
/TOPOLOGY/blocking_initiator
/server_discovery_and_monitoring/monitoring/heartbeat/single/succeeded
/Session/end/mock/single
/Topology/connect_timeout/succeed
/Topology/try_once/succeed
/Topology/server_removed/single
/Topology/add_and_scan_failure
/Topology/ismaster_retry/single/hangup
/Topology/incompatible_error
/Topology/compatible_null_error_pointer
|
|
Now, I have the following tests which fails:
./test-libmongoc --no-fork -d -t -f -l /TOPOLOGY/scanner_connection_error
generating :
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: ENTRY: mongoc_topology_description_init():75
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: EXIT: mongoc_topology_description_init():94
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: ENTRY: mongoc_server_description_init():104
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: EXIT: mongoc_server_description_init():123
2018/04/17 15:49:37.0139: [7733382]: TRACE: cluster: ENTRY: mongoc_cluster_init():1871
2018/04/17 15:49:37.0139: [7733382]: TRACE: cluster: EXIT: mongoc_cluster_init():1898
2018/04/17 15:49:37.0139: [7733382]: TRACE: client: ENTRY: mongoc_client_command_simple():1634
2018/04/17 15:49:37.0139: [7733382]: TRACE: cluster: ENTRY: _mongoc_cluster_stream_for_optype():1963
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: ENTRY: mongoc_topology_description_select():798
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: TRACE: mongoc_topology_description_select():807 Topology type single, [localhost:9876] is down
2018/04/17 15:49:37.0139: [7733382]: TRACE: mongoc: EXIT: mongoc_topology_description_select():808
2018/04/17 15:49:37.0140: [7733382]: TRACE: topology_scanner: ENTRY: mongoc_topology_scanner_node_connect_tcp():468
2018/04/17 15:49:37.0140: [7733382]: TRACE: socket: ENTRY: mongoc_socket_new():977
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: ENTRY: _mongoc_socket_setnodelay():556
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: EXIT: _mongoc_socket_setnodelay():568
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: ENTRY: _mongoc_socket_setkeepalive():521
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_setkeepalive():525 Setting SO_KEEPALIVE
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_set_sockopt_if_less():468 'TCP_KEEPIDLE' is 7200, target value is 300
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_set_sockopt_if_less():479 'TCP_KEEPIDLE' value changed to 300
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_set_sockopt_if_less():468 'TCP_KEEPINTVL' is 75, target value is 10
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_set_sockopt_if_less():479 'TCP_KEEPINTVL' value changed to 10
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_set_sockopt_if_less():468 'TCP_KEEPCNT' is 8, target value is 9
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: EXIT: _mongoc_socket_setkeepalive():537
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: EXIT: mongoc_socket_new():1009
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: ENTRY: mongoc_socket_connect():840
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_capture_errno():68 setting errno: 79 Connection refused
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: TRACE: _mongoc_socket_errno_is_again():617 errno is: 79
2018/04/17 15:49:37.0141: [7733382]: TRACE: socket: EXIT: mongoc_socket_connect():872
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: ENTRY: _mongoc_stream_socket_poll():223
2018/04/17 15:49:37.0142: [7733382]: TRACE: socket: ENTRY: mongoc_socket_poll():283
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: EXIT: _mongoc_stream_socket_poll():249
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: ENTRY: mongoc_stream_writev():150
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: TRACE: mongoc_stream_writev():162 writev = 200a58c8 [9]
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00000: bb 01 00 00 01 00 00 00 00 00 00 00 d4 07 00 00 . . . . . . . . . . . . . . . .
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00010: 04 00 00 00 61 64 6d 69 6e 2e 24 63 6d 64 00 00 . . . . a d m i n . $ c m d . .
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00020: 00 00 00 ff ff ff ff 94 01 00 00 10 69 73 4d 61 . . . . . . . . . . . . i s M a
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00030: 73 74 65 72 00 01 00 00 00 03 63 6c 69 65 6e 74 s t e r . . . . . . c l i e n t
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00040: 00 67 01 00 00 03 64 72 69 76 65 72 00 29 00 00 . g . . . . d r i v e r . ) . .
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00050: 00 02 6e 61 6d 65 00 07 00 00 00 6d 6f 6e 67 6f . . n a m e . . . . . m o n g o
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00060: 63 00 02 76 65 72 73 69 6f 6e 00 06 00 00 00 31 c . . v e r s i o n . . . . . 1
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00070: 2e 39 2e 34 00 00 03 6f 73 00 50 00 00 00 02 74 . 9 . 4 . . . o s . P . . . . t
2018/04/17 15:49:37.0142: [7733382]: TRACE: stream: 00080: 79 70 65 00 05 00 00 00 55 6e 69 78 00 02 6e 61 y p e . . . . . U n i x . . n a
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00090: 6d 65 00 04 00 00 00 41 49 58 00 02 76 65 72 73 m e . . . . . A I X . . v e r s
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 000a0: 69 6f 6e 00 02 00 00 00 31 00 02 61 72 63 68 69 i o n . . . . . 1 . . a r c h i
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 000b0: 74 65 63 74 75 72 65 00 0d 00 00 00 30 30 46 39 t e c t u r e . . . . . 0 0 F 9
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 000c0: 46 44 34 42 34 43 30 30 00 00 02 70 6c 61 74 66 F D 4 B 4 C 0 0 . . . p l a t f
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 000d0: 6f 72 6d 00 cf 00 00 00 63 66 67 3d 30 78 31 65 o r m . . . . . c f g = 0 x 1 e
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 000e0: 33 30 64 39 20 70 6f 73 69 78 3d 32 30 30 31 31 3 0 d 9 p o s i x = 2 0 0 1 1
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 000f0: 32 20 73 74 64 63 3d 32 30 31 31 31 32 20 43 43 2 s t d c = 2 0 1 1 1 2 C C
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00100: 3d 47 43 43 20 37 2e 31 2e 30 20 43 46 4c 41 47 = G C C 7 . 1 . 0 C F L A G
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00110: 53 3d 22 2d 49 2f 75 73 72 2f 69 6e 63 6c 75 64 S = " - I / u s r / i n c l u d
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00120: 65 20 2d 49 2f 6f 70 74 2f 66 72 65 65 77 61 72 e - I / o p t / f r e e w a r
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00130: 65 2f 69 6e 63 6c 75 64 65 22 20 4c 44 46 4c 41 e / i n c l u d e " L D F L A
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00140: 47 53 3d 22 2d 4c 2f 6f 70 74 2f 66 72 65 65 77 G S = " - L / o p t / f r e e w
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00150: 61 72 65 2f 6c 69 62 20 2d 57 6c 2c 2d 62 6c 69 a r e / l i b - W l , - b l i
2018/04/17 15:49:37.0143: [7733382]: TRACE: stream: 00160: 62 70 61 74 68 3a 2f 6f 70 74 2f 66 72 65 65 77 b p a t h : / o p t / f r e e w
2018/04/17 15:49:37.0144: [7733382]: TRACE: stream: 00170: 61 72 65 2f 6c 69 62 3a 2f 75 73 72 2f 6c 69 62 a r e / l i b : / u s r / l i b
2018/04/17 15:49:37.0144: [7733382]: TRACE: stream: 00180: 3a 2f 6c 69 62 20 20 20 20 20 20 20 20 20 20 20 : / l i b
2018/04/17 15:49:37.0144: [7733382]: TRACE: stream: 00190: 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2d -
2018/04/17 15:49:37.0144: [7733382]: TRACE: stream: 001a0: 6c 69 6e 74 6c 22 00 00 04 63 6f 6d 70 72 65 73 l i n t l " . . . c o m p r e s
2018/04/17 15:49:37.0144: [7733382]: TRACE: stream: 001b0: 73 69 6f 6e 00 05 00 00 00 00 00 s i o n . . . . . . .
2018/04/17 15:49:37.0144: [7733382]: TRACE: stream: ENTRY: _mongoc_stream_socket_writev():199
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: ENTRY: mongoc_socket_sendv():1351
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: ENTRY: _mongoc_socket_try_sendv():1256
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: TRACE: _mongoc_socket_try_sendv():1262 sendbuf = 200a51c8 [9]
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: 00000: bb 01 00 00 01 00 00 00 00 00 00 00 d4 07 00 00 . . . . . . . . . . . . . . . .
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: 00010: 04 00 00 00 61 64 6d 69 6e 2e 24 63 6d 64 00 00 . . . . a d m i n . $ c m d . .
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: 00020: 00 00 00 ff ff ff ff 94 01 00 00 10 69 73 4d 61 . . . . . . . . . . . . i s M a
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: 00030: 73 74 65 72 00 01 00 00 00 03 63 6c 69 65 6e 74 s t e r . . . . . . c l i e n t
2018/04/17 15:49:37.0144: [7733382]: TRACE: socket: 00040: 00 67 01 00 00 03 64 72 69 76 65 72 00 29 00 00 . g . . . . d r i v e r . ) . .
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00050: 00 02 6e 61 6d 65 00 07 00 00 00 6d 6f 6e 67 6f . . n a m e . . . . . m o n g o
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00060: 63 00 02 76 65 72 73 69 6f 6e 00 06 00 00 00 31 c . . v e r s i o n . . . . . 1
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00070: 2e 39 2e 34 00 00 03 6f 73 00 50 00 00 00 02 74 . 9 . 4 . . . o s . P . . . . t
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00080: 79 70 65 00 05 00 00 00 55 6e 69 78 00 02 6e 61 y p e . . . . . U n i x . . n a
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00090: 6d 65 00 04 00 00 00 41 49 58 00 02 76 65 72 73 m e . . . . . A I X . . v e r s
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 000a0: 69 6f 6e 00 02 00 00 00 31 00 02 61 72 63 68 69 i o n . . . . . 1 . . a r c h i
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 000b0: 74 65 63 74 75 72 65 00 0d 00 00 00 30 30 46 39 t e c t u r e . . . . . 0 0 F 9
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 000c0: 46 44 34 42 34 43 30 30 00 00 02 70 6c 61 74 66 F D 4 B 4 C 0 0 . . . p l a t f
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 000d0: 6f 72 6d 00 cf 00 00 00 63 66 67 3d 30 78 31 65 o r m . . . . . c f g = 0 x 1 e
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 000e0: 33 30 64 39 20 70 6f 73 69 78 3d 32 30 30 31 31 3 0 d 9 p o s i x = 2 0 0 1 1
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 000f0: 32 20 73 74 64 63 3d 32 30 31 31 31 32 20 43 43 2 s t d c = 2 0 1 1 1 2 C C
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00100: 3d 47 43 43 20 37 2e 31 2e 30 20 43 46 4c 41 47 = G C C 7 . 1 . 0 C F L A G
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00110: 53 3d 22 2d 49 2f 75 73 72 2f 69 6e 63 6c 75 64 S = " - I / u s r / i n c l u d
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00120: 65 20 2d 49 2f 6f 70 74 2f 66 72 65 65 77 61 72 e - I / o p t / f r e e w a r
2018/04/17 15:49:37.0145: [7733382]: TRACE: socket: 00130: 65 2f 69 6e 63 6c 75 64 65 22 20 4c 44 46 4c 41 e / i n c l u d e " L D F L A
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 00140: 47 53 3d 22 2d 4c 2f 6f 70 74 2f 66 72 65 65 77 G S = " - L / o p t / f r e e w
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 00150: 61 72 65 2f 6c 69 62 20 2d 57 6c 2c 2d 62 6c 69 a r e / l i b - W l , - b l i
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 00160: 62 70 61 74 68 3a 2f 6f 70 74 2f 66 72 65 65 77 b p a t h : / o p t / f r e e w
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 00170: 61 72 65 2f 6c 69 62 3a 2f 75 73 72 2f 6c 69 62 a r e / l i b : / u s r / l i b
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 00180: 3a 2f 6c 69 62 20 20 20 20 20 20 20 20 20 20 20 : / l i b
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 00190: 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 2d -
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 001a0: 6c 69 6e 74 6c 22 00 00 04 63 6f 6d 70 72 65 73 l i n t l " . . . c o m p r e s
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: 001b0: 73 69 6f 6e 00 05 00 00 00 00 00 s i o n . . . . . . .
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: TRACE: _mongoc_socket_try_sendv():1282 Send -1 out of 4 bytes
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: TRACE: _mongoc_socket_capture_errno():68 setting errno: 32 Broken pipe
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: TRACE: mongoc_socket_errno():592 Current errno: 32
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: EXIT: _mongoc_socket_try_sendv():1306
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: TRACE: mongoc_socket_sendv():1363 Sent -1 (of 4) out of iovcnt=9
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: TRACE: _mongoc_socket_errno_is_again():617 errno is: 32
2018/04/17 15:49:37.0146: [7733382]: TRACE: socket: GOTO: mongoc_socket_sendv():1373 CLEANUP
2018/04/17 15:49:37.0147: [7733382]: TRACE: socket: EXIT: mongoc_socket_sendv():1431
2018/04/17 15:49:37.0147: [7733382]: TRACE: socket: TRACE: mongoc_socket_errno():592 Current errno: 32
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: EXIT: _mongoc_stream_socket_writev():205
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: EXIT: mongoc_stream_writev():165
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: ENTRY: mongoc_stream_failed():80
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: ENTRY: _mongoc_stream_socket_failed():89
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: ENTRY: _mongoc_stream_socket_destroy():71
2018/04/17 15:49:37.0147: [7733382]: TRACE: socket: ENTRY: mongoc_socket_close():770
2018/04/17 15:49:37.0147: [7733382]: TRACE: socket: EXIT: mongoc_socket_close():805
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: EXIT: _mongoc_stream_socket_destroy():82
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: EXIT: _mongoc_stream_socket_failed():93
2018/04/17 15:49:37.0147: [7733382]: TRACE: stream: EXIT: mongoc_stream_failed():90
2018/04/17 15:49:37.0147: [7733382]: TRACE: mongoc: ENTRY: mongoc_server_description_handle_ismaster():493
2018/04/17 15:49:37.0147: [7733382]: TRACE: mongoc: EXIT: mongoc_server_description_handle_ismaster():500
2018/04/17 15:49:37.0147: [7733382]: TRACE: mongoc: TRACE: mongoc_topology_description_handle_ismaster():1955 No transition entry to Single for Unknown
2018/04/17 15:49:37.0147: [7733382]: TRACE: mongoc: ENTRY: mongoc_topology_description_select():798
2018/04/17 15:49:37.0147: [7733382]: TRACE: mongoc: TRACE: mongoc_topology_description_select():807 Topology type single, [localhost:9876] is down
2018/04/17 15:49:37.0148: [7733382]: TRACE: mongoc: EXIT: mongoc_topology_description_select():808
2018/04/17 15:49:37.0148: [7733382]: TRACE: cluster: EXIT: _mongoc_cluster_stream_for_optype():1971
2018/04/17 15:49:37.0148: [7733382]: TRACE: client: EXIT: mongoc_client_command_simple():1670
tests/test-mongoc-topology-scanner.c:304 test_topology_scanner_connection_error(): [No suitable servers found (`serverSelectionTryOnce` set): [Failed to write rpc bytes. calling ismaster on 'localhost:9876']] does not contain [connection refused calling ismaster on 'localhost:9876']
IOT/Abort trap (core dumped)
(gdb) where
#0 0xd051ceb4 in pthread_kill () from /usr/lib/libpthreads.a(shr_xpg5.o)
#1 0xd051c2ec in _p_raise () from /usr/lib/libpthreads.a(shr_xpg5.o)
#2 0xd011f904 in raise () from /usr/lib/libc.a(shr.o)
#3 0xd01aeee8 in abort () from /usr/lib/libc.a(shr.o)
#4 0x10131d04 in test_topology_scanner_connection_error ()
#5 0x10088a54 in TestSuite_AddHelper ()
#6 0x10089590 in TestSuite_RunTest ()
#7 0x1008a264 in TestSuite_RunNamed ()
#8 0x1008a3dc in TestSuite_Run ()
#9 0x100058f4 in main (argc=268435940, argv=0x0) at /opt/freeware/lib/gcc/powerpc-ibm-aix6.1.0.0/7.1.0/include-fixed/math.h:603
|
|
Hummm I think that it is also an issue in the bson/bson-memory.c code since the "errno" value should be printed with a switch() based on the possible values in order to help the work of people trying to understand what is happening.
Here is a candidate patch.
|
|
Thanks for investigating! Looks like some of those tests that allocate huge strings run out of memory, or at least can't allocate a 16MB contiguous block. That's not anything the driver can or should do something about, it's a problem with the tests.
|
|
Jump to the end in order to read the solution.
IFF the gdb debugger I'm using now works fine, there are no allocation greater than 1 million.
I've used: break 66 if ret->len > 1000000 and it never stopped.
I've looked to the value of ret->len from the beg, and the biggest I've seen is: 31. However gdb crashed before the end of the test...
A printf() added to bson_string_new() confirms: only several 1 value and one 32 value for ret->alloc .
I have now line numbers for libbson too:
#4 0x10006a6c in bson_realloc (mem=0x0, num_bytes=16777216) at src/bson/bson-memory.c:153
#5 0x10006adc in bson_realloc_ctx (mem=0x200a7548, num_bytes=16777216, ctx=0x0) at src/bson/bson-memory.c:189
#6 0x10023144 in _bson_impl_alloc_grow (impl=0x10023204 <_bson_grow+80>, size=4034316480) at src/bson/bson.c:187
#7 0x10023204 in _bson_grow (bson=0x250ae868, size=16777188) at src/bson/bson.c:221
#8 0x10023418 in _bson_append_va (bson=0x10023698 <_bson_append+340>, n_bytes=804395264, n_pairs=638291088, first_len=16837856, first_data=0x250ae868 "",
args=0xffffe4 <error: Cannot access memory at address 0xffffe4>) at src/bson/bson.c:325
#9 0x10023698 in _bson_append (bson=0x100267c0 <bson_append_utf8+436>, n_pairs=16, n_bytes=4034768808, first_len=3490783808, first_data=0x250ae868 "") at src/bson/bson.c:404
#10 0x100267c0 in bson_append_utf8 (bson=0x100abd88 <_bson_json_read_string+376>, key=0x0, key_length=638291096, value=0x0, length=621471848) at src/bson/bson.c:1619
#11 0x100abd88 in _bson_json_read_string (reader=0x100b04b8 <_pop_callback+380>, val=0x2ff21a40 "/\362\032\240\336\255\276\357%\n\345\350&\v\214\230", vlen=804395680)
at src/bson/bson-json.c:893
#12 0x100b04b8 in pop_callback (json=0x100b3194 <jsonsl_feed+4592>, action=4034803896, state=0xdeadbeef, buf=0x20008ebc <_dbargs+24144> "") at src/bson/bson-json.c:1915
#13 0x100b3194 in jsonsl_feed (jsn=0x5, bytes=0x2ff228dc "/\362)\260/\362)\301/\362)\313/\362)\316/\362)\321", nbytes=804399348) at src/jsonsl/jsonsl.c:539
#14 0x100b098c in bson_json_reader_read (reader=0x100b115c <bson_new_from_json+200>, bson=0x0, error=0x0) at src/bson/bson-json.c:2053
#15 0x100b115c in bson_new_from_json (data=0x100a67d8 <tmp_bson+132> "`", len=537074992, error=0x2ff21ec0) at src/bson/bson-json.c:2229
#16 0x100a67d8 in tmp_bson (json=0x100eeb58 <test_upsert_unordered_oversized+236> "`") at tests/test-conveniences.c:94
#17 0x100eeb58 in test_upsert_unordered_oversized (ctx=0x100b8900 <TestSuite_RunTest+584>) at tests/test-mongoc-bulk.c:585
#18 0x100b8900 in TestSuite_RunTest (suite=0x100b95d4 <TestSuite_RunNamed+316>, test=0x2ff22784, count=0x2ff22780) at tests/TestSuite.c:686
#19 0x100b95d4 in TestSuite_RunNamed (suite=0x100b974c <TestSuite_Run+188>, testname=0x200039dc <__dbargs+2416> "\020\v|\274 ") at tests/TestSuite.c:1007
#20 0x100b974c in TestSuite_Run (suite=0x100058f4 <main+1112>) at tests/TestSuite.c:1044
#21 0x100058f4 in main (argc=268435940, argv=0x0) at tests/test-libmongoc.c:2179
bson_realloc (mem=0x0 ..) !!!
bson_realloc_ctx(... ctx=0x0) @ctx: Ignored
src/bson/bson-memory.c
The issue appears to be in the call to gMemVtable.realloc() from bson_realloc()
void *
bson_realloc (void mem, / IN */
size_t num_bytes) /* IN */
{
printf("bson_realloc: mem: %p - num_bytes: %d\n", mem, num_bytes);
/*
- Not all platforms are guaranteed to free() the memory if a call to
- realloc() with a size of zero occurs. Windows, Linux, and FreeBSD do,
- however, OS X does not.
*/
if (BSON_UNLIKELY (num_bytes == 0))
{
gMemVtable.free (mem);
return NULL;
}
mem = gMemVtable.realloc (mem, num_bytes);
printf("bson_realloc: after gMemVtable.realloc: mem: %p - num_bytes: %d\n", mem, num_bytes);
Here are the traces:
bson_string_new: ret->alloc: 1
Begin /BulkOperation/upsert_unordered_oversized, seed 809807955
bson_string_new: ret->alloc: 30
bson_string_new: power of two: ret->alloc: 32
bson_realloc: mem: 200a6d48 - num_bytes: 64
bson_realloc: after gMemVtable.realloc: mem: 200a6d48 - num_bytes: 64
bson_string_new: ret->alloc: 1
bson_realloc: mem: 20026fe8 - num_bytes: 2
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 2
bson_realloc: mem: 20026fe8 - num_bytes: 4
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 4
bson_realloc: mem: 20026fe8 - num_bytes: 8
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 8
bson_realloc: mem: 20026fe8 - num_bytes: 16
bson_realloc: after gMemVtable.realloc: mem: 200a7428 - num_bytes: 16
bson_string_new: ret->alloc: 1
bson_realloc: mem: 20026fe8 - num_bytes: 2
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 2
bson_realloc: mem: 20026fe8 - num_bytes: 4
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 4
bson_realloc: mem: 20026fe8 - num_bytes: 8
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 8
bson_realloc: mem: 20026fe8 - num_bytes: 16
bson_realloc: after gMemVtable.realloc: mem: 200a7428 - num_bytes: 16
bson_string_new: ret->alloc: 1
bson_realloc: mem: 20026fe8 - num_bytes: 2
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 2
bson_realloc: mem: 20026fe8 - num_bytes: 4
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 4
bson_realloc: mem: 20026fe8 - num_bytes: 8
bson_realloc: after gMemVtable.realloc: mem: 20026fe8 - num_bytes: 8
bson_realloc: mem: 20026fe8 - num_bytes: 16
bson_realloc: after gMemVtable.realloc: mem: 200a6d78 - num_bytes: 16
bson_string_new: ret->alloc: 1
bson_string_new: ret->alloc: 1
bson_realloc: mem: 2002ac98 - num_bytes: 2
bson_realloc: after gMemVtable.realloc: mem: 2002ac98 - num_bytes: 2
bson_realloc: mem: 2002ac98 - num_bytes: 4
bson_realloc: after gMemVtable.realloc: mem: 2002ac98 - num_bytes: 4
bson_realloc: mem: 2002ac98 - num_bytes: 8
bson_realloc: after gMemVtable.realloc: mem: 2002ac98 - num_bytes: 8
bson_realloc: mem: 200a6cf8 - num_bytes: 37
bson_realloc: after gMemVtable.realloc: mem: 200a6cf8 - num_bytes: 37
bson_realloc: mem: 200a7618 - num_bytes: 33554389
bson_realloc: after gMemVtable.realloc: mem: 210ae7a8 - num_bytes: 33554389
bson_realloc: mem: 250b8e08 - num_bytes: 32768
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 32768
bson_realloc: mem: 250b8e08 - num_bytes: 65536
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 65536
bson_realloc: mem: 250b8e08 - num_bytes: 131072
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 131072
bson_realloc: mem: 250b8e08 - num_bytes: 262144
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 262144
bson_realloc: mem: 250b8e08 - num_bytes: 524288
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 524288
bson_realloc: mem: 250b8e08 - num_bytes: 1048576
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 1048576
bson_realloc: mem: 250b8e08 - num_bytes: 2097152
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 2097152
bson_realloc: mem: 250b8e08 - num_bytes: 4194304
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 4194304
bson_realloc: mem: 250b8e08 - num_bytes: 8388608
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 8388608
bson_realloc: mem: 250b8e08 - num_bytes: 16777216
bson_realloc: after gMemVtable.realloc: mem: 250b8e08 - num_bytes: 16777216
bson_realloc_ctx: mem: 200a76c8 - num_bytes: 16777216
bson_realloc: mem: 200a76c8 - num_bytes: 16777216
bson_realloc: after gMemVtable.realloc: mem: 0 - num_bytes: 16777216
Under gdb, 0x200a76c8 is a valid address.
And "man realloc" says:
The Pointer must point to an address returned by a malloc subsystem allocation routine, and must not have been previously deallocated. Undefined results occur if Pointer does not meet these criteria.
I've added traces for all alloc, alloc0, realloc, free routines of libbson, and I get for the new address 200a7748 that now makes gMemVtable.realloc() return: mem: 0:
bson_malloc: mem: 200a7748 - num_bytes: 128
bson_realloc_ctx: mem: 200a7748 - num_bytes: 16777216
bson_realloc: mem: 200a7748 - num_bytes: 16777216
OK. I've understood.
By default on AIX, the available memory is small. Like:
- ulimit -a
core file size (blocks, -c) 1048575
data seg size (kbytes, -d) 131072 <<<<<<<<
file size (blocks, -f) 1048575
max memory size (kbytes, -m) 32768
open files (-n) 2000
pipe size (512 bytes, -p) 64
stack size (kbytes, -s) 32768
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
About the allocation, the data segment (ulimit -d) is too small. So realloc() returns 0 AND it sets errno to 12 !
By adding some lines of code in bson-memory.c :
#include <errno.h>
printf("bson_realloc: after gMemVtable.realloc: mem: %p - num_bytes: %d - errno: %d\n", mem, num_bytes, errno);
I can see:
bson_realloc: after gMemVtable.realloc: mem: 0 - num_bytes: 16777216 - errno: 12
/usr/include/errno.h:#define ENOMEM 12 /* Not enough space */
OK. that seems clear.
With ulimit -d unlimited, that works much better.
OK. I would suggest to add the print of the "errno" variable in addition, in order to speed up understanding what the issue is.
I'll test the other failures asap. However, we have a general electricity shutdown early this afternoon. So I'll probably continue this next Monday.
|
|
Thanks, don't know what's going wrong with the "npages" diagnostic output on AIX but that's not related to the test failures you see. We don't actually use npages for any driver logic, we just output that value during tests. (Don't know why, it was implemented by my predecessor.)
I think the crucial question is the one I wrote above, whether bson_string_new calls malloc with the correct argument or a wrong one.
I'm curious why "/TOPOLOGY/scanner_connection_error" fails, so if you run it and as I suggested you should see an error. If it's a timing error then it's perhaps CDRIVER-2567.
Regarding long-term support for AIX: we don't plan to support AIX officially.
|
|
tests/TestSuite.c :
uint64_t npages = 0;
...
#if defined(_SC_PHYS_PAGES)
npages = sysconf (_SC_PHYS_PAGES);
#endif
fprintf (stream,
" \"host\": {\n"
" \"sysname\": \"%s\",\n"
" \"release\": \"%s\",\n"
" \"machine\": \"%s\",\n"
" \"memory\":
{\n"
" \"pagesize\": %" PRIu64 ",\n"
" \"npages\": %" PRIu64 "\n"
" }
\n"
" },\n",
u.sysname,
u.release,
u.machine,
pagesize,
npages);
#endif
}
man sysconf on Linux:
These values also exist, but may not be standard.
- _SC_PHYS_PAGES
The number of pages of physical memory. Note that it is possible for the product of this value and the value of _SC_PAGESIZE to overflow.
However, that should be 0.
Unless the %" PRIu64 " makes it wrong ?
Hummmm That reminds me another issue with PRIu64 for GCC Go:
"The GCC "system.h" header must be included earlier for PRIx64 and PRIu64 to be defined correctly."
|
|
Hi,
--no-fork: OK. I should have understood this before.
With some trace, in both 32bit and 64bit, I see :
"npages": 18446744073709551615 !!!!!!!!! 2^64-1 !!
It is only : 129912 on a Linux/PPC64LE VM for the same test.
And I see that on AIX: ./test-libmongoc -d -l "/TOPOLOGY/scanner_connection_error" generates also 129912 .
That may explain the issue !
However, there are PASSing tests that generate the same huge value for npages.
Any idea what is this "npages" value and where it is computed ?
Meanwhile, I'll follow your suggestions for debugging.
About "scanner_connection_error" test in error, I have no idea why it failed since it passes now even if mongod is down.
About the support of AIX by mongoc, I would be happy, once all bugs are fixed, to see AIX being supported by mongoc.
I've seen that there is a buildfarm. Maybe we may add a machine if the framework is not too complex.
Thanks !
- ./test-libmongoc -l /BulkOperation/upsert_unordered_oversized 2>&1 | tee /tmp/P
{
"host": {
"sysname": "AIX",
"release": "1",
"machine": "00F9FD4B4C00",
"memory":
"pagesize": 4096,
* "npages": 18446744073709551615*
},
"auth":
{ "user": "", "pass": "" }
,
"addr":
{ "host": "10.197.64.27", "port": 27017, "uri": "mongodb://10.197.64.27:27017/" }
,
"gssapi":
{ "host": "", "user": "" }
,
"uds": "%2Ftmp%2Fmongodb-27017.sock",
"compressors": "",
"SSL":
{
"enabled": false,
"weak_cert_validation": false,
"pem_file": "",
"pem_pwd": "",
"ca_file": "",
"ca_dir": "",
"crl_file": ""
}
,
"framework":
{
"monitoringVerbose": false,
"mockServerLog": "",
"futureTimeoutMS": 10000,
"majorityReadConcern": false,
"skipLiveTests": false,
"IPv6": false
}
,
"options":
{
"fork": true,
"tracing": false
}
,
"results": [
{ "status": "FAIL", "test_file": "/BulkOperation/upsert_unordered_oversized", "seed": "2786200190", "start": 10992917.294652226, "end": 10992918.951419945, "elapsed": 1.656767719 }
]
Though, from a Linux/PPC64LE machine, I see:
- ./test-libmongoc -d -f -l /BulkOperation/upsert_unordered_oversized
{
"host":
"sysname": "Linux",
"release": "4.15.8-300.fc27.ppc64le",
"machine": "ppc64le",
"memory":
"pagesize": 65536,
"npages": 129912
|
|
The tests are crashing without "--no-fork", also, they're just not coredumping in the same way since the crash occurs in a child process.
It looks like the tests that crash mostly involve allocations of over 16 MB while constructing oversized documents for the sake of testing them:
https://github.com/mongodb/mongo-c-driver/blob/0c91d3f67a7e30be2193cc77e84a131ed97bd601/tests/test-mongoc-bulk.c#L631
I have a couple theories. One is that bson_string_new calls malloc with an incorrect size, perhaps there is some arithmetic leading to that call which is not implemented correctly for your architecture. The second theory is that the arithmetic is correct but the malloc simply fails to allocate such a large block. You can determine which by stepping through the test in a debugger, execute the test like:
./test-libmongoc --no-fork -d -l /BulkOperation/upsert_unordered_oversized
|
Set a breakpoint in bson_string_new, around this line:
ret->alloc = ret->len + 1;
|
I'd make a conditional breakpoint and only stop there if ret->len is greater than a million.
To set expectations correctly, I have to warn you we don't test on AIX, don't officially support it in the driver, and don't have access to an AIX machine at the moment. But if you determine the cause of the bug we're happy to review a pull request.
The failure in "/TOPOLOGY/scanner_connection_error" I don't know about, but if you run just that test you should see some informative output:
./test-libmongoc --no-fork -d -l "/TOPOLOGY/scanner_connection_error"
|
|
Generated at Wed Feb 07 21:15:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.