[CDRIVER-1946] SEGFAULT with fast performance counters on NUMA Created: 08/Dec/16 Updated: 12/Dec/16 Resolved: 12/Dec/16 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | libmongoc |
| Affects Version/s: | 1.3.6, 1.5.0 |
| Fix Version/s: | 1.5.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jonathan Wang | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux (CentOS), clang 3.9.0 |
||
| Attachments: |
|
| Description |
|
The call to mongoc_client_new segfaults some/most of the time. It occasionally works, but there seems to be no pattern as to when it segfaults and when it succeeds. The same code was compiled against versions 1.3.6 and 1.5.0 of the Mongo C driver, and in both cases the behavior is similar. See the attachment for the output when the example program is compiled, the source code for the example program, and the LLDB output, and the compile command. NOTE: This also affects the C++ driver because the C++ driver calls the C driver internally. |
| Comments |
| Comment by Hannes Magnusson [ 12/Dec/16 ] | |||||||||||
|
I've cherry-pick this to r1.5 so it will be part of the upcoming 1.5.1 | |||||||||||
| Comment by Githook User [ 12/Dec/16 ] | |||||||||||
|
Author: {u'username': u'jwangdy', u'name': u'Jonathan Wang', u'email': u'jwang@domeyard.com'}Message: | |||||||||||
| Comment by A. Jesse Jiryu Davis [ 12/Dec/16 ] | |||||||||||
|
Hannes's workstation is a NUMA (2-node 24-core) Linux machine; he reproduced the occasional crash and validated the fix. | |||||||||||
| Comment by Githook User [ 12/Dec/16 ] | |||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@emptysquare.net'}Message: Merge pull request #411 from jwangdy/master
| |||||||||||
| Comment by Githook User [ 12/Dec/16 ] | |||||||||||
|
Author: {u'username': u'jwangdy', u'name': u'Jonathan Wang', u'email': u'jwang@domeyard.com'}Message: | |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
I can submit a pull request. | |||||||||||
| Comment by A. Jesse Jiryu Davis [ 09/Dec/16 ] | |||||||||||
|
I tested on Mac OS and IA32_TSC_AUX is always 0 there, so all counters increment slot 0. Some mailing lists confirm that it's 0 on MacOS. I think this means Mac will have some extra cache-thrashing related to counters, but the code is at least correct. High performance on highly parallel Macs is not a priority for us compared to Linux. We haven't yet implemented fast counters on Windows ( (It comes to my attention now that no one's asked for fast counters on Windows so I intend to close that ticket.) jwang thanks so much for your diagnosis, would you like to submit a pull request or would you rather we make the fix? | |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
Actually, now that I look at it, it could be possible that no operating system other than Linux sets the IA32_TSC_AUX MSR register. I have been unable to find any documentation for Windows regarding the use of this register, and I also have not been able to find any documentation or source code in FreeBSD or XNU (OS X kernel), where this register is set. Additionally, I ran the same code where I print out the result of the mongoc_sched_getcpu function on OS X with ENABLE_RDTSCP set, and it always returns 0. If this is the case, you might want to change the first ifdef for the mongoc_sched_getcpu function to #if defined(ENABLE_RDTSCP) && defined(__linux__). You might also want to add an OS X specific version of mongoc_sched_getcpu that doesn't use RDTSCP as well. | |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
One thing to note is that I'm not sure whether or not the new version of _mongoc_sched_getcpu that I provided above will work on other Unix or Windows systems. This is because I don't think there is a standard way to set IA32_TSC_AUX MSR, so if other operating systems set it differently than the way Linux sets it, you may need an ifdef inside _mongoc_sched_getcpu to have a different implementation for extracting CPU ID from rdc for each system that initializes IA32_TSC_AUX MSR differently. | |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
Here is the segment of code from the most recent master commit in the linux kernel that shows where the IA32_TSC_AUX MSR register is set to contain both the node ID and the CPU ID:
| |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
The issue is that the register rcx actually contains the value of the low-order 32-bits of IA32_TSC_AUX MSR, which is loaded for each processor by the Linux kernel to contain the processor number in the 11:0 bits and the node number in the 31:12 bits. So for machines with only one node, or when the process is running on the 0 node, the previous implementation of the code worked. However, when the node is not 0 (in my example above where the segfault happened on _mongoc_sched_getcpu() = 4157), the code would return the entire result of register rcx, which would be an invalid number because it would be the core number plus the node number (61 + 4096), where 4096 comes from the 12th bit of rcx. | |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
I believe this should solve the issue:
| |||||||||||
| Comment by Jonathan Wang [ 09/Dec/16 ] | |||||||||||
|
I added a printout for the functions that you requested:
_mongoc_get_cpu_count is called from the mongoc_counters_calc_size function, and it is also directly called from the _mongoc_counters_init function. In both cases, it returns 80, which is the correct number for the number of cores on the machine. | |||||||||||
| Comment by A. Jesse Jiryu Davis [ 09/Dec/16 ] | |||||||||||
|
Thanks for this excellent debugging. Could you please confirm what _mongoc_get_cpu_count returns, when it's called from mongoc_counters_calc_size, which is called from _mongoc_counters_init? A possible fix is to replace this:
with this:
In real life we'd want to cache the CPU count but you get the idea. However that doesn't answer why the _mongoc_sched_getcpu function isn't working as expected in the first place. | |||||||||||
| Comment by Jonathan Wang [ 08/Dec/16 ] | |||||||||||
|
After looking at the documentation for the RDTSCP instruction, it does seem that the current implementation of
is the correct way to use the RDTSCP instruction to get the processor ID. I don't have any idea why it doesn't work on my 80-core machine though. | |||||||||||
| Comment by Jonathan Wang [ 08/Dec/16 ] | |||||||||||
|
Also, I ran the program (Mongo C configured with ENABLE_RDTSCP) with the environment variable MONGOC_DISABLE_SHM=1, but it still segfaults. | |||||||||||
| Comment by Jonathan Wang [ 08/Dec/16 ] | |||||||||||
|
Here is the full configure command I used when I got the segfault:
And here is the full configure command I used when I removed ENABLE_RDTSCP and no longer got the segfault:
| |||||||||||
| Comment by Jonathan Wang [ 08/Dec/16 ] | |||||||||||
|
I believe that this is caused because I am setting the --enable-rdtscp flag when configuring the Mongo C driver. The segfault occurs in the mongoc_counter_clients_active_inc() function call in the _mongoc_client_new_from_uri function inside mongoc-client.c at line 778. The mongoc_counter_clients_active_inc() function call is defined by a macro as:
This block of code segfaults when the --enable-rdtscp flag is set because in the src/mongoc/mongoc-counters-private.h file, there is a define that sets _mongoc_sched_getcpu to a block of assembly code when ENABLE_RDTSCP is set (otherwise, it is set to sched_getcpu or 0). The _mongoc_sched_getcpu function when ENABLE_RDTSCP is set is defined as follows:
I checked the value returned by this function, and when I get a segfault, I see that it returns an unreasonably large number such as 4157 (the machine I'm running on has 80 cores). It seems reasonable to say that this function is incorrect. When running the code against a Mongo C driver built with --disable-rdtscp, the segfault no longer occurs and the value returned by _mongoc_sched_getcpu is always less than the number of cores on the machine. | |||||||||||
| Comment by Hannes Magnusson [ 08/Dec/16 ] | |||||||||||
|
Can you try running your program after setting MONGOC_DISABLE_SHM environment variable to 1? | |||||||||||
| Comment by A. Jesse Jiryu Davis [ 08/Dec/16 ] | |||||||||||
|
Can I see your ./configure command line and the entire configure output please? I've tried configuring and installing the driver with clang on Ubuntu 16 (I don't have easy access to clang on CentOS right now), and could not reproduce the crash with your example program. Thanks. | |||||||||||
| Comment by Jonathan Wang [ 08/Dec/16 ] | |||||||||||
|
I build Mongo C driver in debug mode and ran it through LLDB. The output is in the mongo_c_segfault_debug.txt file |