[CDRIVER-149] sigsegv whtn connecting to replica set Created: 09/Jun/12  Updated: 03/May/17  Resolved: 23/Aug/12

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 0.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Pawel Assignee: Gregor Macadam
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

amazon ec2, 64bit


Attachments: File cdriver-149.c    

 Description   

I'm attempting to connect to a replica set.
The connection is failing (for whatever reason, I think because the first address listed is not a primary), but at some point, the program fails itself. Here is the stack trace, and some peeked values.
I suspect something's going wrong in the mongo_cursor_next() function that messes up the data.

This is off of e1642ecf18f1d447f0b509f86d79df37df10e064

Program terminated with signal 11, Segmentation fault.
#0  0xb7668616 in ?? () from /lib/i386-linux-gnu/tls/i686/nosegneg/libc.so.6
(gdb) bt
#0  0xb7668616 in ?? () from /lib/i386-linux-gnu/tls/i686/nosegneg/libc.so.6
#1  0xb778cc91 in ?? () from /usr/lib/libefence.so.0
#2  0xb778ce9a in EF_Exit () from /usr/lib/libefence.so.0
#3  0xb778ca6f in Page_Create () from /usr/lib/libefence.so.0
#4  0xb778c87b in memalign () from /usr/lib/libefence.so.0
#5  0xb778c32d in malloc () from /usr/lib/libefence.so.0
#6  0xb77c03e5 in bson_malloc (size=1852793701) at src/bson.c:982
#7  0xb77c0488 in _bson_init_size (size=1852793701, b=0xa1cb8c90)
    at src/bson.c:620
#8  bson_init_size (b=0xa1cb8c90, size=1852793701) at src/bson.c:631
#9  0xb77c6f64 in mongo_find_one (conn=0xb74f394c, 
    ns=0x5152bff4 "admin.$cmd", query=0xa1cb8ea4, fields=0xa1cb8d2c, 
    out=0xa1cb8c90) at src/mongo.c:1235
#10 0xb77c7c1a in mongo_run_command (conn=0xb74f394c, db=0xb77ca3fe "admin", 
    command=0xa1cb8ea4, out=0xa1cb8e08) at src/mongo.c:1477
#11 0xb77c7f95 in mongo_simple_int_command (conn=0xb74f394c, 
    db=0xb77ca3fe "admin", cmdstr=0xb77ca404 "ismaster", arg=1, 
    realout=0xa1cb8f9c) at src/mongo.c:1509
#12 0xb77c845a in mongo_replset_check_host (conn=0xb74f394c)
    at src/mongo.c:554
#13 mongo_replset_connect (conn=0xb74f394c) at src/mongo.c:616
#14 0x0804c150 in check_mongo_conn (wi=0xb74f394c) at listener.c:1100
#15 0x0804bf4a in process_packet (from=0x51665e9c) at listener.c:1042
#16 0x0804b24b in read_from_fd (fdd=0x51665e9c) at listener.c:734
#17 0x0804aff1 in worker_loop (arg=0xb74f394c) at listener.c:641
#18 0xb77a0d4c in start_thread ()
   from /lib/i386-linux-gnu/tls/i686/nosegneg/libpthread.so.0
#19 0xb76d0ace in clone ()
   from /lib/i386-linux-gnu/tls/i686/nosegneg/libc.so.6
(gdb) frame 9
#9  0xb77c6f64 in mongo_find_one (conn=0xb74f394c, 
    ns=0x5152bff4 "admin.$cmd", query=0xa1cb8ea4, fields=0xa1cb8d2c, 
    out=0xa1cb8c90) at src/mongo.c:1235
1235                bson_init_size( out, bson_size( (bson *)&cursor->current ) );
(gdb) list
1230        mongo_cursor_set_fields( cursor, fields );
1231        mongo_cursor_set_limit( cursor, 1 );
1232
1233        if ( mongo_cursor_next( cursor ) == MONGO_OK ) {
1234            if( out ) {
1235                bson_init_size( out, bson_size( (bson *)&cursor->current ) );
1236                memcpy( out->data, cursor->current.data,
1237                    bson_size( (bson *)&cursor->current ) );
1238                out->finished = 1;
1239            }
(gdb) print (bson *)&cursor->current
$1 = (bson *) 0xa1cb8b8c
(gdb) print *(bson *)&cursor->current
$2 = {data = 0x5152ff60 "econdary", cur = 0x0, dataSize = 0, finished = 1, 
  stack = {0 <repeats 32 times>}, stackPos = 0, err = 0, errstr = 0x0}
(gdb) print *cursor
$4 = {reply = 0x5152ff3c, conn = 0xb74f394c, ns = 0x5152dff4 "admin.$cmd", 
  flags = 2, seen = 1929904128, current = {data = 0x5152ff60 "econdary", 
    cur = 0x0, dataSize = 0, finished = 1, stack = {0 <repeats 32 times>}, 
    stackPos = 0, err = 0, errstr = 0x0}, err = MONGO_CURSOR_EXHAUSTED, 
  query = 0xa1cb8ea4, fields = 0xa1cb8d2c, options = 0, limit = 1, skip = 0}



 Comments   
Comment by Gregor Macadam [ 14/Aug/12 ]

Yes I've tried with both primary added first and secondary added first.
Take a look at the attached code and tell me - is this similar what you are doing or something different?

Comment by Pawel [ 13/Aug/12 ]

There isn't really much code, just calling replset_init()
In your test, do you have at least 2 hosts, and is your first host not actual primary? It was working for me, if the order I added the hosts was so that the primary was the first, not wasn't working when primary wasn't the first.

Comment by Gregor Macadam [ 13/Aug/12 ]

Hi - I've tried to reproduce this but so far have not been able to. Do you have some code that you could give me that reproduces this error?

Generated at Wed Feb 07 21:08:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.