- 
    Type:
New Feature
 - 
    Resolution: Won't Fix
 - 
    Priority:
Major - P3
 - 
    None
 - 
    Affects Version/s: 2.4.0
 - 
    Component/s: Configuration
 - 
    None
 - 
    Environment:Tested in (but not restricted to) Linux with pyspark (python 3.6.8), mongod (v4.0.6) and spark (v2.4.0)
 
- 
        (copied to CRM)
 
- 
        None
 - 
        None
 - 
        None
 - 
        None
 - 
        None
 - 
        None
 
It appears that a SparkSession object cannot support more than 1 concurrent Kerberos principal. Each spark application can conceptually require up to 3 principals:
- Input URI
 - Output URI
 - The default or operating context in which the parent application operates in
 
There are 2 failure modes that arise out of this:
- If the input & output URI credentials differ by principal (typically uncommon)
 - If the default context differs from either of the URIs
 
In the second case here, if a spark application (pyspark) was configured to authenticate to an endpoint (like Hadoop) with a principal that differs from the MongoDB URI credentials, one set of connection will fail to auth. Only the connections with the matching principal for the Kerberos security context will succeed.
I think there is are two layers that are contributing to this:
- The MongoClient cache pooling would make the security context common across threads
 - The inherent behaviour of the Java driver that is limited to accessing the default GSSAPI security context in the JVM. Ie, it it not designed to select a named context by principal
 
In summary, the Java driver relies naively on the default kerberos token on which it initialises the GSSAPI security context inside MongoClient. Given these 3 components share the same default context, it's not possible for any of the 3 components to differ by principal.