[CDRIVER-1946] SEGFAULT with fast performance counters on NUMA Created: 08/Dec/16  Updated: 12/Dec/16  Resolved: 12/Dec/16

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.3.6, 1.5.0
Fix Version/s: 1.5.1

Type: Bug Priority: Major - P3
Reporter: Jonathan Wang Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux (CentOS), clang 3.9.0


Attachments: Text File mongo_c_segfault.txt     Text File mongo_c_segfault_debug.txt    

 Description   

The call to mongoc_client_new segfaults some/most of the time. It occasionally works, but there seems to be no pattern as to when it segfaults and when it succeeds. The same code was compiled against versions 1.3.6 and 1.5.0 of the Mongo C driver, and in both cases the behavior is similar.

See the attachment for the output when the example program is compiled, the source code for the example program, and the LLDB output, and the compile command.

NOTE: This also affects the C++ driver because the C++ driver calls the C driver internally.



 Comments   
Comment by Hannes Magnusson [ 12/Dec/16 ]

I've cherry-pick this to r1.5 so it will be part of the upcoming 1.5.1

Comment by Githook User [ 12/Dec/16 ]

Author:

{u'username': u'jwangdy', u'name': u'Jonathan Wang', u'email': u'jwang@domeyard.com'}

Message: CDRIVER-1946 SEGFAULT with fast performance counters on NUMA
Branch: r1.5
https://github.com/mongodb/mongo-c-driver/commit/bbd31116492d89d5fbbd2ac19370f825ef0819f1

Comment by A. Jesse Jiryu Davis [ 12/Dec/16 ]

Hannes's workstation is a NUMA (2-node 24-core) Linux machine; he reproduced the occasional crash and validated the fix.

Comment by Githook User [ 12/Dec/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@emptysquare.net'}

Message: Merge pull request #411 from jwangdy/master

CDRIVER-1946 SEGFAULT with fast performance counters on NUMA
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/205538289782d335b61bf7e46df69987fb0ce9cf

Comment by Githook User [ 12/Dec/16 ]

Author:

{u'username': u'jwangdy', u'name': u'Jonathan Wang', u'email': u'jwang@domeyard.com'}

Message: CDRIVER-1946 SEGFAULT with fast performance counters on NUMA
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/7e80f45d6cc576fdc1ee42510a7fea38dcc2fff5

Comment by Jonathan Wang [ 09/Dec/16 ]

I can submit a pull request.

Comment by A. Jesse Jiryu Davis [ 09/Dec/16 ]

I tested on Mac OS and IA32_TSC_AUX is always 0 there, so all counters increment slot 0. Some mailing lists confirm that it's 0 on MacOS. I think this means Mac will have some extra cache-thrashing related to counters, but the code is at least correct. High performance on highly parallel Macs is not a priority for us compared to Linux.

We haven't yet implemented fast counters on Windows (CDRIVER-280) so getting the CPU ID isn't an issue there; the counters are all incremented in slot 0 instead of a CPU-specific slot.

(It comes to my attention now that no one's asked for fast counters on Windows so I intend to close that ticket.)

jwang thanks so much for your diagnosis, would you like to submit a pull request or would you rather we make the fix?

Comment by Jonathan Wang [ 09/Dec/16 ]

Actually, now that I look at it, it could be possible that no operating system other than Linux sets the IA32_TSC_AUX MSR register. I have been unable to find any documentation for Windows regarding the use of this register, and I also have not been able to find any documentation or source code in FreeBSD or XNU (OS X kernel), where this register is set. Additionally, I ran the same code where I print out the result of the mongoc_sched_getcpu function on OS X with ENABLE_RDTSCP set, and it always returns 0. If this is the case, you might want to change the first ifdef for the mongoc_sched_getcpu function to #if defined(ENABLE_RDTSCP) && defined(__linux__). You might also want to add an OS X specific version of mongoc_sched_getcpu that doesn't use RDTSCP as well.

Comment by Jonathan Wang [ 09/Dec/16 ]

One thing to note is that I'm not sure whether or not the new version of _mongoc_sched_getcpu that I provided above will work on other Unix or Windows systems. This is because I don't think there is a standard way to set IA32_TSC_AUX MSR, so if other operating systems set it differently than the way Linux sets it, you may need an ifdef inside _mongoc_sched_getcpu to have a different implementation for extracting CPU ID from rdc for each system that initializes IA32_TSC_AUX MSR differently.

Comment by Jonathan Wang [ 09/Dec/16 ]

Here is the segment of code from the most recent master commit in the linux kernel that shows where the IA32_TSC_AUX MSR register is set to contain both the node ID and the CPU ID:

#ifdef CONFIG_X86_64
static void vgetcpu_cpu_init(void *arg)
{
	int cpu = smp_processor_id();
	struct desc_struct d = { };
	unsigned long node = 0;
#ifdef CONFIG_NUMA
	node = cpu_to_node(cpu);
#endif
	if (static_cpu_has(X86_FEATURE_RDTSCP))
		write_rdtscp_aux((node << 12) | cpu);

Comment by Jonathan Wang [ 09/Dec/16 ]

The issue is that the register rcx actually contains the value of the low-order 32-bits of IA32_TSC_AUX MSR, which is loaded for each processor by the Linux kernel to contain the processor number in the 11:0 bits and the node number in the 31:12 bits. So for machines with only one node, or when the process is running on the 0 node, the previous implementation of the code worked. However, when the node is not 0 (in my example above where the segfault happened on _mongoc_sched_getcpu() = 4157), the code would return the entire result of register rcx, which would be an invalid number because it would be the core number plus the node number (61 + 4096), where 4096 comes from the 12th bit of rcx.

Comment by Jonathan Wang [ 09/Dec/16 ]

I believe this should solve the issue:

#if defined(ENABLE_RDTSCP)
 static BSON_INLINE unsigned
 _mongoc_sched_getcpu (void)
 {
    volatile uint32_t rax, rdx, rcx;
    __asm__ volatile ("rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (rcx) : : );
    unsigned node_id, core_id;
    // node_id = (rcx & 0xFFF000)>>12;  // node_id is unused
    core_id = rcx & 0xFFF;
    return core_id;
 }

Comment by Jonathan Wang [ 09/Dec/16 ]

I added a printout for the functions that you requested:

_mongoc_counters_init(): about to call mongoc_counters_calc_size()
mongoc_counters_calc_size(): _mongoc_get_cpu_count() = 80
_mongoc_counters_init(): _mongoc_get_cpu_count() = 80
mongoc_counter_add(): _mongoc_sched_getcpu() = 4157
Segmentation fault

_mongoc_get_cpu_count is called from the mongoc_counters_calc_size function, and it is also directly called from the _mongoc_counters_init function. In both cases, it returns 80, which is the correct number for the number of cores on the machine.

Comment by A. Jesse Jiryu Davis [ 09/Dec/16 ]

Thanks for this excellent debugging.

Could you please confirm what _mongoc_get_cpu_count returns, when it's called from mongoc_counters_calc_size, which is called from _mongoc_counters_init?

A possible fix is to replace this:

__mongoc_counter_##ident.cpus[_mongoc_sched_getcpu()].slots

with this:

__mongoc_counter_##ident.cpus[_mongoc_sched_getcpu() % _mongoc_get_cpu_count()].slots

In real life we'd want to cache the CPU count but you get the idea.

However that doesn't answer why the _mongoc_sched_getcpu function isn't working as expected in the first place.

Comment by Jonathan Wang [ 08/Dec/16 ]

After looking at the documentation for the RDTSCP instruction, it does seem that the current implementation of

 _mongoc_sched_getcpu (void)
 {
    volatile uint32_t rax, rdx, aux;
    __asm__ volatile ("rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return aux;
 }

is the correct way to use the RDTSCP instruction to get the processor ID. I don't have any idea why it doesn't work on my 80-core machine though.

Comment by Jonathan Wang [ 08/Dec/16 ]

Also, I ran the program (Mongo C configured with ENABLE_RDTSCP) with the environment variable MONGOC_DISABLE_SHM=1, but it still segfaults.

Comment by Jonathan Wang [ 08/Dec/16 ]

Here is the full configure command I used when I got the segfault:

export CC=clang
export CFLAGS="-fPIC -pthread"
PKG_CONFIG_LIBDIR=/usr/local/custom/openssl/lib/pkgconfig ./configure --prefix=/usr/local/custom/mongodb/ --disable-shared --enable-static --enable-rdtscp --disable-sasl --with-libbson=bundled

And here is the full configure command I used when I removed ENABLE_RDTSCP and no longer got the segfault:

export CC=clang
export CFLAGS="-fPIC -pthread"
PKG_CONFIG_LIBDIR=/usr/local/custom/openssl/lib/pkgconfig ./configure --prefix=/usr/local/custom/mongodb/ --disable-shared --enable-static --disable-rdtscp --disable-sasl --with-libbson=bundled

Comment by Jonathan Wang [ 08/Dec/16 ]

I believe that this is caused because I am setting the --enable-rdtscp flag when configuring the Mongo C driver.

The segfault occurs in the mongoc_counter_clients_active_inc() function call in the _mongoc_client_new_from_uri function inside mongoc-client.c at line 778. The mongoc_counter_clients_active_inc() function call is defined by a macro as:

mongoc_counter_##ident##_add (int64_t val) \
{ \
   _mongoc_counter_add(\
      __mongoc_counter_##ident.cpus[_mongoc_sched_getcpu()].slots[ \
         COUNTER_##ident%SLOTS_PER_CACHELINE], val); \
} \

This block of code segfaults when the --enable-rdtscp flag is set because in the src/mongoc/mongoc-counters-private.h file, there is a define that sets _mongoc_sched_getcpu to a block of assembly code when ENABLE_RDTSCP is set (otherwise, it is set to sched_getcpu or 0). The _mongoc_sched_getcpu function when ENABLE_RDTSCP is set is defined as follows:

 {
    volatile uint32_t rax, rdx, aux;
    __asm__ volatile ("rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return aux;
 }

I checked the value returned by this function, and when I get a segfault, I see that it returns an unreasonably large number such as 4157 (the machine I'm running on has 80 cores). It seems reasonable to say that this function is incorrect.

When running the code against a Mongo C driver built with --disable-rdtscp, the segfault no longer occurs and the value returned by _mongoc_sched_getcpu is always less than the number of cores on the machine.

Comment by Hannes Magnusson [ 08/Dec/16 ]

Can you try running your program after setting MONGOC_DISABLE_SHM environment variable to 1?

Comment by A. Jesse Jiryu Davis [ 08/Dec/16 ]

Can I see your ./configure command line and the entire configure output please?

I've tried configuring and installing the driver with clang on Ubuntu 16 (I don't have easy access to clang on CentOS right now), and could not reproduce the crash with your example program.

Thanks.

Comment by Jonathan Wang [ 08/Dec/16 ]

I build Mongo C driver in debug mode and ran it through LLDB. The output is in the mongo_c_segfault_debug.txt file

Generated at Wed Feb 07 21:13:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.