Type: Bug
Resolution: Unresolved
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: 3.11, 4.15.3
Component/s: BSON
Labels:
- python

Confidence Status:
🟡 Potential Risk

Assigned Teams:

Python Drivers

Documentation Changes Summary:

Hide

1. What would you like to communicate to the user about this feature?
2. Would you like the user to see examples of the syntax and/or executable code and its output?
3. Which versions of the driver/connector does this apply to?

Show
1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

Vulnerability report

The system says that I cannot file a vulnerability, so I chose bug.

Test Environment

Python Version:3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
Library: Linux 6.8.0-85-generic (Ubuntu/Debian)
Library: pymongo (bson package)
Tested and verified on PyMongo Versions:
- PyMongo 3.11 (system-installed,)
- PyMongo 4.15.3
Date: 2025-11-06

Executive Summary

The Python BSON library (pymongo) has 4 significant security concerns identified during security testing. More precisely:

2 HIGH severity (security risks)
2 MEDIUM severity (spec violations & data integrity)

1. Duplicate Keys Accepted (HIGH)

Description

The BSON specification explicitly forbids duplicate keys in documents, but the Python implementation silently accepts them, taking the last value without warning.

Proof of Concept

import bson
part1 = b'\x02a\x00\x02\x00\x00\x001\x00' # a: "1"
part2 = b'\x02a\x00\x02\x00\x00\x002\x00' # a: "2"
malicious_raw = b'\x17\x00\x00\x00' + part1 + part2 + b'\x00'decoded = bson.BSON(malicious_raw).decode()
print(decoded) # {'a': '2'} - silently accepts, last entry wins

Impact

Severity: HIGH
Type: Specification Violation / Security Risk
Attack Vectors:

- Key injection attacks: The attacker can override security-critical keys
- Validation bypasses: First key passes validation, second key is used
- Database inconsistencies: Different BSON implementations handle differently
- Access control bypass: Override permission flags by duplicating keys

Real-World Scenario

# Application validates first 'role' key
data_from_user = craft_bson({
'role': 'user', # Passes validation
'role': 'admin' # Actually used by system
})
# User gets admin access!

Mitigation Recommendations

Check for the existence of duplicate keys.

Validation Layer

def validate_no_duplicate_keys(bson_bytes):
"""Validate BSON has no duplicate keys by parsing manually."""
decoded = bson.BSON(bson_bytes).decode()
# Re-encode and check size - duplicates would make it larger
re_encoded = bson.BSON.encode(decoded) if len(bson_bytes) != len(re_encoded):
raise ValueError("Duplicate keys detected in BSON") return decoded

Manual Key Tracking

def safe_decode_bson(bson_bytes):
"""Decode BSON while tracking for duplicate keys."""
seen_keys = set()
# Parse BSON manually to track keys
# (requires low-level BSON parsing - complex but thorough)
pass

2. Malformed BSON Structure Accepted (HIGH)

Description

The BSON decoder accepts invalid byte structures that should be rejected according to the BSON specification. This is a critical security concern.

Proof of Concept

import bson
raw = b'\x05\x00\x00\x00\x00' # Claims 5 bytes, but structure is invalid
decoded = bson.BSON(raw).decode() # Should fail, but succeeds
print(decoded) # Output: {}

Impact

Severity: HIGH
Type: Security Risk / Parser Confusion
Attack Vectors:

- Parser confusion attacks: Different parsers interpret malformed length differently
- Security boundary bypass: Validation on one system, execution on another
- Buffer overflow potential: Incorrect length fields can cause memory issues

Real-World Scenario

Client sends malformed BSON with incorrect length field to web API
Python validator "accepts" it (silently parses as empty document)
Passes to MongoDB/C++ backend which may interpret differently
Parser disagreement leads to unexpected behavior
Potential for data corruption or security bypass

Mitigation Recommendations

Perform strict validation.

def strict_bson_decode(bson_bytes):
"""Decode BSON with strict validation."""
if len(bson_bytes) < 5:
raise ValueError("BSON document too short") # Check declared length matches actual length
declared_length = int.from_bytes(bson_bytes[:4], 'little')
if declared_length != len(bson_bytes):
raise ValueError(f"BSON length mismatch: declared {declared_length}, actual {len(bson_bytes)}") # Check null terminator
if bson_bytes[-1] != 0:
raise ValueError("BSON document not null-terminated") return bson.BSON(bson_bytes).decode()

3. NaN and Infinity Accepted (MEDIUM)

Description

The BSON specification does not officially support NaN (Not a Number) or Infinity values, but the Python implementation accepts them without error.

Proof of Concept

# All of these succeed when they should fail
import bson
bson.BSON.encode({"value": float("nan")}) # NaN
bson.BSON.encode({"value": float("inf")}) # Positive Infinity
bson.BSON.encode({"value": float("-inf")}) # Negative Infinity

Impact

Severity: MEDIUM
Type: Specification Violation / Interoperability Issue
Problems:

- Interoperability issues: Other BSON implementations may reject or mishandle
- MongoDB query issues: NaN behaves unexpectedly in comparisons
- Data validation bypasses: NaN != NaN breaks equality checks
- Sorting anomalies: NaN and Infinity break ordering

Real-World Issues

Query Problems

# Store NaN in database (Python accepts it)
db.collection.insert_one({"score": float("nan")})# Query fails or behaves unexpectedly
db.collection.find({"score": {"$gt": 0) # NaN is not > 0, not < 0, not == 0}}

Validation Bypass

def validate_score(score):
if score < 0 or score > 100:
raise ValueError("Score must be 0-100")
return score }}{{# NaN bypasses validation (NaN < 0 is False, NaN > 100 is False)
validate_score(float("nan")) # Passes!

Mitigation Recommendations

Pre-Encoding Validation

import mathdef validate_no_special_floats(data):
if isinstance(data, float):
if math.isnan(data) or math.isinf(data):
raise ValueError(f"Invalid float value: {data}")
elif isinstance(data, dict):
for value in data.values():
validate_no_special_floats(value)
elif isinstance(data, (list, tuple)):
for item in data:
validate_no_special_floats(item)
return datadata = {"score": user_input}{}
validate_no_special_floats(data)
bson.BSON.encode(data)

MongoDB Schema Validation

db.createCollection("scores", {
validator: {
$jsonSchema: {
bsonType: "object",
properties: {
score: {
bsonType: "double",
minimum: 0,
maximum: 100

{\{ }

}}
}

{\{ }

}}
}
})

4. Timezone Information Loss (MEDIUM)

Description

When encoding datetime objects to BSON, timezone information is lost during the decode process, resulting in naive datetime objects.

Proof of Concept

import bson
import datetimenow = datetime.datetime(2023, 1, 1, 12, 0, 0, tzinfo=datetime.timezone.utc)
print(now.tzinfo) # datetime.timezone.utc
encoded = bson.BSON.encode({"ts": now})
decoded = bson.BSON(encoded).decode()
print(decoded["ts"].tzinfo) # None (lost!)

Impact

Severity: MEDIUM
Type: Data Integrity / Data Loss
Problems:

- Incorrect time calculations across timezones
- Silent bugs that are hard to detect
- Data corruption in timezone-sensitive applications
- Compliance issues for applications requiring timezone tracking

Real-World Scenario

# User schedules event at 3 PM EST
event_time = datetime.datetime(2023, 6, 1, 15, 0, tzinfo=EST)# Store in MongoDB via BSON
db.events.insert_one({"time": event_time})# Retrieve in PST timezone application
retrieved = db.events.find_one()
event_time_back = retrieved["time"] # Naive datetime!# Application assumes local timezone (PST)
# Event now appears at with a wrong time!

Mitigation Recommendations

Always Store as UTC

import datetimedef encode_with_utc(data):
"""Convert all datetimes to UTC before encoding."""
if isinstance(data, datetime.datetime):
if data.tzinfo is None:
raise ValueError("Naive datetime not allowed")
return data.astimezone(datetime.timezone.utc)
elif isinstance(data, dict):
return {k: encode_with_utc(v) for k, v in data.items()}
elif isinstance(data, list):
return [encode_with_utc(item) for item in data]
return data

Re-apply Timezone on Decode

def decode_with_utc(data):
"""Re-apply UTC timezone to decoded datetimes."""
if isinstance(data, datetime.datetime):
if data.tzinfo is None:
return data.replace(tzinfo=datetime.timezone.utc)
return data
elif isinstance(data, dict):
return {k: decode_with_utc(v) for k, v in data.items()}
elif isinstance(data, list):
return [decode_with_utc(item) for item in data]
return data# Use consistently
decoded = bson.BSON(encoded).decode()
decoded = decode_with_utc(decoded)

`Document Behavior`

Document in code that BSON datetimes are always UTC and naive after decoding

Please note that all datetime fields in MongoDB are stored as UTC milliseconds.
After decoding, they are naive datetime objects and should be treated as UTC.

related to

SERVER-6439 Duplicate fields at the same level should not be allowed

Backlog

Details

Description

Vulnerability report

Test Environment

Executive Summary

1. Duplicate Keys Accepted (HIGH)

Description

Proof of Concept

Impact

Real-World Scenario

Mitigation Recommendations

Validation Layer

Manual Key Tracking

2. Malformed BSON Structure Accepted (HIGH)

Description

Proof of Concept

Impact

Real-World Scenario

Mitigation Recommendations

3. NaN and Infinity Accepted (MEDIUM)

Description

Proof of Concept

Impact

Real-World Issues

Query Problems

Validation Bypass

Mitigation Recommendations

Pre-Encoding Validation

MongoDB Schema Validation

4. Timezone Information Loss (MEDIUM)

Description

Proof of Concept

Impact

Real-World Scenario

Mitigation Recommendations

Always Store as UTC

Re-apply Timezone on Decode

Document Behavior

Attachments

Issue Links

Activity

People

Dates

`Document Behavior`