[CSHARP-4797] Segmentation Fault when Attempting to Connect to Mongodb Created: 18/Sep/23  Updated: 01/Feb/24  Resolved: 01/Feb/24

Status: Closed
Project: C# Driver
Component/s: Security
Affects Version/s: None
Fix Version/s: 2.24.0

Type: Bug Priority: Unknown
Reporter: Gina DiCarlo Assignee: James Kovacs
Resolution: Fixed Votes: 0
Labels: gssapi, kerberos
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
Case:
Backwards Compatibility: Fully Compatible
Documentation Changes: Not Needed
Documentation Changes Summary:

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?


 Description   

Summary

When using the MongoClient on Kubernetes to access a MongoDb database I get a segmentation fault where something in the library appears to be attempting to access/overwrite protected memory. 

Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).

I've tested against versions 2.21.0 and 2.19.0 

The MongoDB topology is a replicaset with two nodes  (one primary and one secondary)

How to Reproduce

Create a .NET Console application and deploy it to Kubernetes with 4 Gi of ephemeral storage

In Program.cs add the following code:

 

var mongoUrl = new MongoUrlBuilder() 
{    
 Username = <USERNAME>,     
UseTls = true,     
ReplicaSetName = <RS>,     
AuthenticationMechanism = "GSSAPI",    
 Servers = new List<MongoServerAddress> { new MongoServerAddress("S1", PORT), new MongoServerAddress("S2", PORT) } 
};
using var loggerFactory = LoggerFactory.Create(b => {     
b.ClearProviders();    
 b.AddSimpleConsole();
     b.SetMinimumLevel(Microsoft.Extensions.Logging.LogLevel.Trace); 
});
var settings = MongoClientSettings.FromConnectionString(mongoUrl.ToMongoUrl().ToString()); Console.WriteLine($"mongourl {mongoUrl.ToMongoUrl().ToString()}"); settings.LoggingSettings = new LoggingSettings(loggerFactory);
MongoClient dbClient = new MongoClient(settings);Console.WriteLine("made it to db client creation"); 
 try {       
 dbClient.StartSession();    
Console.WriteLine("getting database");     
var database = dbClient.GetDatabase("DBNAME");    
Console.WriteLine("running command");     
var test  = database.RunCommand<BsonDocument>(new BsonDocument("ping", 1));   Console.WriteLine($"end - running command {test}");} 
catch (Exception ex) {     Console.WriteLine(ex.ToString());    } Console.WriteLine("Pinged your deployment. You successfully connected to MongoDB!");
 

 

 

Additional Background

Via logging statements I have confirmed that the connection string is formatted as expected by the URL builder, I was able to get the MongoDB client trace logs using the logging configuration above.. hopefully that'll be helpful to your  debugging:

mongourl mongodb://username@Secondary_Server:Port,Primary_Server:Port/?authMechanism=GSSAPI;tls=true;replicaSet=RS
dbug: MongoDB.SDAM[0]
      1 Initialized
dbug: MongoDB.SDAM[0]
      1 Cluster opening
dbug: MongoDB.SDAM[0]
      1 Secondary_Server Port Adding server
dbug: MongoDB.SDAM[0]
      1 Secondary_Server Port Added server
dbug: MongoDB.SDAM[0]
      1 Primary_Server Port Adding server
dbug: MongoDB.SDAM[0]
      1 Primary_Server Port Added server
dbug: MongoDB.SDAM[0]
      1 Description changed
dbug: MongoDB.SDAM[0]
      1 Secondary_Server Port Server opening
dbug: MongoDB.Connection[0]
      1 Secondary_Server Port Connection pool opening 600000 0 100 2 120000 500
dbug: MongoDB.Connection[0]
      1 Secondary_Server Port Connection pool created 600000 0 100 2 120000 500
dbug: MongoDB.Internal.IServerMonitor[0]
      1 Secondary_Server Port Initializing
dbug: MongoDB.Internal.IServerMonitor[0]
      1 Secondary_Server Port Initialized
dbug: MongoDB.SDAM[0]
      1 Secondary_Server Port Server opened
dbug: MongoDB.SDAM[0]
      1 Primary_Server Port Server opening
dbug: MongoDB.Connection[0]
      1 Primary_Server Port Connection pool opening 600000 0 100 2 120000 500
dbug: MongoDB.Connection[0]
      1 Primary_Server Port Connection pool created 600000 0 100 2 120000 500
dbug: MongoDB.Internal.IServerMonitor[0]
      1 Primary_Server Port Initializing
dbug: MongoDB.Internal.IServerMonitor[0]
      1 Primary_Server Port Initialized
dbug: MongoDB.SDAM[0]
      1 Primary_Server Port Server opened
dbug: MongoDB.SDAM[0]
      1 Cluster opened
dbug: MongoDB.Internal.RoundTripTimeMonitor[0]
      1 Primary_Server Port Monitoring started
dbug: MongoDB.Internal.RoundTripTimeMonitor[0]
      1 Secondary_Server Port Monitoring started
made it to db client creation
starting session
dbug: MongoDB.ServerSelection[0]
      1 CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

, OperationsCountServerSelector } (null) (null) { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Secondary_Server:Port" }

", EndPoint: "Unspecified/Secondary_Server:Port", ReasonChanged: "ServerInitialDescription", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2023-09-18T20:42:12.6463312Z" }, { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Primary_Server:Port" }

", EndPoint: "Unspecified/Primary_Server:Port", ReasonChanged: "ServerInitialDescription", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2023-09-18T20:42:12.6488517Z" }] } Server selection started
info: MongoDB.ServerSelection[0]
      1 CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

, OperationsCountServerSelector } (null) (null) { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Disconnected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Secondary_Server:Port" }

", EndPoint: "Unspecified/Secondary_Server:Port", ReasonChanged: "ServerInitialDescription", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2023-09-18T20:42:12.6463312Z" }, { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Primary_Server:Port" }

", EndPoint: "Unspecified/Primary_Server:Port", ReasonChanged: "ServerInitialDescription", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2023-09-18T20:42:12.6488517Z" }] } Waiting for suitable server to become available 29917
dbug: MongoDB.Connection[0]
      1 Secondary_Server Port Connection pool ready
dbug: MongoDB.SDAM[0]
      1 Secondary_Server Port Description changed { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Secondary_Server:Port" }

", EndPoint: "Unspecified/Secondary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff952b1c549aaa3945f506"), "counter" : NumberLong(4) }, Type: "ReplicaSetSecondary", WireVersionRange: "[0, 13]", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9204932Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9204940Z" }
dbug: MongoDB.SDAM[0]
      1 Description changed
dbug: MongoDB.SDAM[0]
      1 3 Secondary_Server Port Heartbeat started
dbug: MongoDB.ServerSelection[0]
      1 CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

, OperationsCountServerSelector } (null) (null) { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Secondary_Server:Port" }

", EndPoint: "Unspecified/Secondary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff952b1c549aaa3945f506"), "counter" : NumberLong(4) }, Type: "ReplicaSetSecondary", WireVersionRange: "[0, 13]", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9204932Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9204940Z" }, { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Primary_Server:Port" }

", EndPoint: "Unspecified/Primary_Server:Port", ReasonChanged: "ServerInitialDescription", State: "Disconnected", ServerVersion: , TopologyVersion: , Type: "Unknown", LastHeartbeatTimestamp: null, LastUpdateTimestamp: "2023-09-18T20:42:12.6488517Z" }] } Server selection succeeded Secondary_Server Port
dbug: MongoDB.Connection[0]
      1 Primary_Server Port Connection pool ready
dbug: MongoDB.SDAM[0]
      1 Primary_Server Port Description changed { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Primary_Server:Port" }

", EndPoint: "Unspecified/Primary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff93a049fd067ff988e8ff"), "counter" : NumberLong(7) }, Type: "ReplicaSetPrimary", WireVersionRange: "[0, 13]", ElectionId: "7fffffff0000000000000008", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9386605Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9386609Z" }
getting database
running command
dbug: MongoDB.SDAM[0]
      Initializing (maxSetVersion, maxElectionId): Saving tuple (setVersion, electionId) of (6, 7fffffff0000000000000008) as (maxSetVersion, maxElectionId) for replica set "dev-112823-eieio-rs0" because replica set primary Unspecified/Primary_Server:Port sent (6, 7fffffff0000000000000008), the first (setVersion, electionId) tuple ever seen for replica set "dev-112823-eieio-rs0".
dbug: MongoDB.SDAM[0]
      1 Description changed
dbug: MongoDB.SDAM[0]
      1 4 Primary_Server Port Heartbeat started
dbug: MongoDB.ServerSelection[0]
      1 CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

, OperationsCountServerSelector } (null) create { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Secondary_Server:Port" }

", EndPoint: "Unspecified/Secondary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff952b1c549aaa3945f506"), "counter" : NumberLong(4) }, Type: "ReplicaSetSecondary", WireVersionRange: "[0, 13]", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9204932Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9204940Z" }, { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Primary_Server:Port" }

", EndPoint: "Unspecified/Primary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff93a049fd067ff988e8ff"), "counter" : NumberLong(7) }, Type: "ReplicaSetPrimary", WireVersionRange: "[0, 13]", ElectionId: "7fffffff0000000000000008", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9386605Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9386609Z" }] } Server selection started
dbug: MongoDB.ServerSelection[0]
      1 CompositeServerSelector{ Selectors = WritableServerSelector, LatencyLimitingServerSelector

{ AllowedLatencyRange = 00:00:00.0150000 }

, OperationsCountServerSelector } (null) create { ClusterId : "1", ConnectionMode : "ReplicaSet", Type : "ReplicaSet", State : "Connected", Servers : [{ ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Secondary_Server:Port" }

", EndPoint: "Unspecified/Secondary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff952b1c549aaa3945f506"), "counter" : NumberLong(4) }, Type: "ReplicaSetSecondary", WireVersionRange: "[0, 13]", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9204932Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9204940Z" }, { ServerId: "

{ ClusterId : 1, EndPoint : "Unspecified/Primary_Server:Port" }

", EndPoint: "Unspecified/Primary_Server:Port", ReasonChanged: "Heartbeat", State: "Connected", ServerVersion: 5.0.0, TopologyVersion: { "processId" : ObjectId("64ff93a049fd067ff988e8ff"), "counter" : NumberLong(7) }, Type: "ReplicaSetPrimary", WireVersionRange: "[0, 13]", ElectionId: "7fffffff0000000000000008", LastHeartbeatTimestamp: "2023-09-18T20:42:12.9386605Z", LastUpdateTimestamp: "2023-09-18T20:42:12.9386609Z" }] } Server selection succeeded Primary_Server Port
dbug: MongoDB.Connection[0]
      1 Primary_Server Port Connection checkout started
dbug: MongoDB.Connection[0]
      1 Primary_Server Port Connection adding
dbug: MongoDB.Connection[0]
      1 5 Primary_Server Port Connection created
dbug: MongoDB.Connection[0]
      1 5 Primary_Server Port Connection opening
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sending
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sent
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Receiving
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Received
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sending
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sent
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Receiving
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Received
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sending
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sent
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Receiving
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Received
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sending
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Sent
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Receiving
trce: MongoDB.Connection[0]
      1 5 Primary_Server Port Received
malloc_consolidate(): unaligned fastbin chunk detected
/bin/sh: line 1:     7 Aborted                 (core dumped) dotnet project_name.dll

 

 

please note:

due to company policy servers, ports, and identifying information cannot be in logs hence primary_server, secondary_server etc

 

 

We had similar code working for a previous replicaset that was torn down, I believe there is a connectivity or authentication issue but the error message is getting written to someplace it shouldn't be in the memory, which is halting troubleshooting. Any help would be appreciated!



 Comments   
Comment by James Kovacs [ 01/Feb/24 ]

We were able to determine the root cause of the failure. When deleting the GSSAPI security context, the driver passed a pointer to the unmanaged data structure whereas the API was expecting a pointer to a pointer. We are working on a fix now. Please follow this ticket for updates.

Comment by PM Bot [ 14/Nov/23 ]

There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to reopen this ticket if you're still experiencing the issue, and add a comment if you're able to provide more information.

Comment by James Kovacs [ 12/Oct/23 ]

Hi, dicarlo.g.m@gmail.com and jess.lam948@gmail.com,

We are tracking this investigation through our support portal. I'll leave this ticket open while we work on understanding the root cause.

Sincerely,
James

Comment by PM Bot [ 12/Oct/23 ]

Hi dicarlo.g.m@gmail.com! CSHARP-4797 is awaiting your response.

If this is still an issue for you, please open Jira to review the latest status and provide your feedback. Thanks!

Comment by James Kovacs [ 04/Oct/23 ]

Hi, Gina and Jessica,

This definitely sounds like a problem related to libgssapi.so. Given that a similar pymongo application in the same container does not encounter this issue, this leaves two possibilities:

1. The pymongo application uses a different version of libgssapi.so.
2. The root cause relates to how the .NET/C# Driver (or the .NET Runtime) interacts with libgssapi.so.

I appreciate all the details regarding the container image and application environment. The next step is to repro the issue in an environment where we can debug the issue further. I see that you have a support case open and jorge.imperial@mongodb.com is working with you on it. He is in the best position to work with you to create a minimal container environment, which we can then use to debug this problem further. Please work with him on the support case so we can move this issue towards resolution.

Sincerely,
James

Comment by James Kovacs [ 04/Oct/23 ]

Hi, Gina and Jessica,

Thank you for the additional information. Understandable that you cannot change to another authentication mechanism due to the configuration of your deployment. I'm reviewing the provided information and will provide next steps. Thank you for your collaboration on investigating this issue.

Sincerely,
James

Comment by Gina DiCarlo [ 04/Oct/23 ]

Jessica has provided all information for response (thank you!). Commenting to see if it moves it out of the 'waiting for reporter' state

Comment by Jessica Lam [ 03/Oct/23 ]

Hi james.kovacs@mongodb.com , this is Jessica, I work with Gina and will help provide and supplement information about this issue that we're running into.

1. If you use SCRAM-SHA as your authentication mechanism rather than Kerberos, do you still see the coredump with "malloc_consolidate(): unaligned fastbin chunk detected"?

Currently, our MongoDB servers are configured to only support authenticating with GSSAPI. As a result, we cannot test authenticating via the SCRAM-SHA authentication mechanism. However, since you mentioned the details may be related to Kerberos, I was able to get some trace logs from Kerberos itself:

[7] 1696355414.557104: Retrieving USER@DOMAIN from FILE:/var/kerberos/krb5/user/99/client.keytab (vno 0, enctype 0) with result: 2/Key table file '/var/kerberos/krb5/user/99/client.keytab' not found
[7] 1696355414.557108: Retrieving USER@DOMAIN from FILE:/var/kerberos/krb5/user/99/client.keytab (vno 0, enctype 0) with result: 2/Key table file '/var/kerberos/krb5/user/99/client.keytab' not found
[7] 1696355414.557112: Getting credentials USER@DOMAIN -> mongodb/PRIMARY_SERVER using ccache FILE:/kcc/USER_krb5cache
[7] 1696355414.557113: Retrieving USER@DOMAIN -> krb5_ccache_conf_data/start_realm@X-CACHECONF: from FILE:/kcc/USER_krb5cache with result: -1765328243/Matching credential not found (filename: /kcc/USER_krb5cache)
[7] 1696355414.557114: Retrieving USER@DOMAIN -> mongodb/PRIMARY_SERVER from FILE:/kcc/USER_krb5cache with result: 0/Success
[7] 1696355414.557116: Creating authenticator for USER@DOMAIN -> mongodb/PRIMARY_SERVER, seqnum SEQNUM, subkey SUBKEY, session key SESSION_KEY
[7] 1696355414.557121: Read AP-REP, time 1696355414.557117, subkey SUBKEY, seqnum SEQNUM
malloc_consolidate(): unaligned fastbin chunk detected
/bin/sh: line 1:     7 Aborted                 (core dumped) dotnet project_name.dll

*identifying information has been redacted (i.e. USER and PRIMARY_SERVER)

From these logs, it seems to me that we are able to properly connect and authenticate with Kerberos. I was also able to execute `klist` in the container and saw that we did have a valid krb5 ticket cache for the MongoDB server.

2. Any stacktraces written to the console associated with the coredump.

All console trace logs from running our code is pasted in the description above.

3. Kubectl commands to build and deploy your application and replset so that we can reproduce the issue locally.

Our container image is based on the RHEL UBI Minimal image. We install the .Net 6.0 runtime as well as the krb5-devel package (needed for the mongodb C# driver to authenticate with GSSAPI), copy over our published & self-contained program files, and finally set the entrypoint as `dotnet project_name.dll`.

We then deploy our main application in a container with a Kerberos sidecar (initializes the Kerberos credential cache and makes it available to the main application container). All of these containes are running in a pod in a deployment with one replica. Please let me know if you need additional information here.

 

Lastly, it might be helpful to note that we wrote similar code in Python, deployed it in the same environment, and that code was able to connect and query our MongoDB server properly.

Comment by James Kovacs [ 29/Sep/23 ]

Hi, dicarlo.g.m@gmail.com,

Thank you for reporting this issue and providing the trace output. The .NET/C# Driver is implemented using managed code and don't directly perform memory allocation through malloc. Instead we rely on the .NET runtime and garbage collector to manage memory used by the driver.

That said, there are a few places where we call low-level C APIs that might directly call malloc. These include:

  • Zstd wire protocol compression
  • field-level encryption (FLE) and queryable encryption (QE)
  • Kerberos support via SSPI on Windows and libgssapi on Linux

You don't appear to be using the first two, but you do configure GSSAPI for your auth mechanism. I would like to confirm whether the issue is due to libgssapi or a .NET runtime bug. Please provide the following additional information to help us investigate further:

1. If you use SCRAM-SHA as your authentication mechanism rather than Kerberos, do you still see the coredump with "malloc_consolidate(): unaligned fastbin chunk detected"?
2. Any stacktraces written to the console associated with the coredump.
3. Kubectl commands to build and deploy your application and replset so that we can reproduce the issue locally.

Sincerely,
James

Comment by PM Bot [ 18/Sep/23 ]

Hi dicarlo.g.m@gmail.com, thank you for reporting this issue! The team will look into it and get back to you soon.

Generated at Wed Feb 07 21:49:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.