Vulnerability report
The system says that I cannot file a vulnerability, so I chose bug.
Test Environment
- Python Version:3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
- Library: Linux 6.8.0-85-generic (Ubuntu/Debian)
- Library: pymongo (bson package)
- Tested and verified on PyMongo Versions:
- PyMongo 3.11 (system-installed,)
- PyMongo 4.15.3
- Date: 2025-11-06
Executive Summary
The Python BSON library (pymongo) has 4 significant security concerns identified during security testing. More precisely:
- 2 HIGH severity (security risks)
- 2 MEDIUM severity (spec violations & data integrity)
1. Duplicate Keys Accepted (HIGH)
Description
The BSON specification explicitly forbids duplicate keys in documents, but the Python implementation silently accepts them, taking the last value without warning.
Proof of Concept
import bson
part1 = b'\x02a\x00\x02\x00\x00\x001\x00' # a: "1"
part2 = b'\x02a\x00\x02\x00\x00\x002\x00' # a: "2"
malicious_raw = b'\x17\x00\x00\x00' + part1 + part2 + b'\x00'decoded = bson.BSON(malicious_raw).decode()
print(decoded) # {'a': '2'} - silently accepts, last entry wins
Impact
- Severity: HIGH
- Type: Specification Violation / Security Risk
- Attack Vectors:
-
- Key injection attacks: The attacker can override security-critical keys
- Validation bypasses: First key passes validation, second key is used
- Database inconsistencies: Different BSON implementations handle differently
- Access control bypass: Override permission flags by duplicating keys
Real-World Scenario
# Application validates first 'role' key
data_from_user = craft_bson({
'role': 'user', # Passes validation
'role': 'admin' # Actually used by system
})
# User gets admin access!
Mitigation Recommendations
Check for the existence of duplicate keys.
Validation Layer
def validate_no_duplicate_keys(bson_bytes):
"""Validate BSON has no duplicate keys by parsing manually."""
decoded = bson.BSON(bson_bytes).decode()
# Re-encode and check size - duplicates would make it larger
re_encoded = bson.BSON.encode(decoded) if len(bson_bytes) != len(re_encoded):
raise ValueError("Duplicate keys detected in BSON") return decoded
Manual Key Tracking
def safe_decode_bson(bson_bytes):
"""Decode BSON while tracking for duplicate keys."""
seen_keys = set()
# Parse BSON manually to track keys
# (requires low-level BSON parsing - complex but thorough)
pass
2. Malformed BSON Structure Accepted (HIGH)
Description
The BSON decoder accepts invalid byte structures that should be rejected according to the BSON specification. This is a critical security concern.
Proof of Concept
import bson
raw = b'\x05\x00\x00\x00\x00' # Claims 5 bytes, but structure is invalid
decoded = bson.BSON(raw).decode() # Should fail, but succeeds
print(decoded) # Output: {}
Impact
- Severity: HIGH
- Type: Security Risk / Parser Confusion
- Attack Vectors:
-
- Parser confusion attacks: Different parsers interpret malformed length differently
- Security boundary bypass: Validation on one system, execution on another
- Buffer overflow potential: Incorrect length fields can cause memory issues
Real-World Scenario
- Client sends malformed BSON with incorrect length field to web API
- Python validator "accepts" it (silently parses as empty document)
- Passes to MongoDB/C++ backend which may interpret differently
- Parser disagreement leads to unexpected behavior
- Potential for data corruption or security bypass
Mitigation Recommendations
Perform strict validation.
def strict_bson_decode(bson_bytes):
"""Decode BSON with strict validation."""
if len(bson_bytes) < 5:
raise ValueError("BSON document too short") # Check declared length matches actual length
declared_length = int.from_bytes(bson_bytes[:4], 'little')
if declared_length != len(bson_bytes):
raise ValueError(f"BSON length mismatch: declared {declared_length}, actual {len(bson_bytes)}") # Check null terminator
if bson_bytes[-1] != 0:
raise ValueError("BSON document not null-terminated") return bson.BSON(bson_bytes).decode()
3. NaN and Infinity Accepted (MEDIUM)
Description
The BSON specification does not officially support NaN (Not a Number) or Infinity values, but the Python implementation accepts them without error.
Proof of Concept
# All of these succeed when they should fail
import bson
bson.BSON.encode({"value": float("nan")}) # NaN
bson.BSON.encode({"value": float("inf")}) # Positive Infinity
bson.BSON.encode({"value": float("-inf")}) # Negative Infinity
Impact
- Severity: MEDIUM
- Type: Specification Violation / Interoperability Issue
- Problems:
-
- Interoperability issues: Other BSON implementations may reject or mishandle
- MongoDB query issues: NaN behaves unexpectedly in comparisons
- Data validation bypasses: NaN != NaN breaks equality checks
- Sorting anomalies: NaN and Infinity break ordering
Real-World Issues
Query Problems
# Store NaN in database (Python accepts it)
db.collection.insert_one({"score": float("nan")})# Query fails or behaves unexpectedly
db.collection.find({"score": {"$gt": 0) # NaN is not > 0, not < 0, not == 0}}
Validation Bypass
def validate_score(score):
if score < 0 or score > 100:
raise ValueError("Score must be 0-100")
return score }}{{# NaN bypasses validation (NaN < 0 is False, NaN > 100 is False)
validate_score(float("nan")) # Passes!
Mitigation Recommendations
Pre-Encoding Validation
import mathdef validate_no_special_floats(data):
if isinstance(data, float):
if math.isnan(data) or math.isinf(data):
raise ValueError(f"Invalid float value: {data}")
elif isinstance(data, dict):
for value in data.values():
validate_no_special_floats(value)
elif isinstance(data, (list, tuple)):
for item in data:
validate_no_special_floats(item)
return datadata = {"score": user_input}{}
validate_no_special_floats(data)
bson.BSON.encode(data)
MongoDB Schema Validation
db.createCollection("scores", {
validator: {
$jsonSchema: {
bsonType: "object",
properties: {
score: {
bsonType: "double",
minimum: 0,
maximum: 100
}}
}
}}
}
})
4. Timezone Information Loss (MEDIUM)
Description
When encoding datetime objects to BSON, timezone information is lost during the decode process, resulting in naive datetime objects.
Proof of Concept
import bson
import datetimenow = datetime.datetime(2023, 1, 1, 12, 0, 0, tzinfo=datetime.timezone.utc)
print(now.tzinfo) # datetime.timezone.utc
encoded = bson.BSON.encode({"ts": now})
decoded = bson.BSON(encoded).decode()
print(decoded["ts"].tzinfo) # None (lost!)
Impact
- Severity: MEDIUM
- Type: Data Integrity / Data Loss
- Problems:
-
- Incorrect time calculations across timezones
- Silent bugs that are hard to detect
- Data corruption in timezone-sensitive applications
- Compliance issues for applications requiring timezone tracking
Real-World Scenario
# User schedules event at 3 PM EST
event_time = datetime.datetime(2023, 6, 1, 15, 0, tzinfo=EST)# Store in MongoDB via BSON
db.events.insert_one({"time": event_time})# Retrieve in PST timezone application
retrieved = db.events.find_one()
event_time_back = retrieved["time"] # Naive datetime!# Application assumes local timezone (PST)
# Event now appears at with a wrong time!
Mitigation Recommendations
Always Store as UTC
import datetimedef encode_with_utc(data):
"""Convert all datetimes to UTC before encoding."""
if isinstance(data, datetime.datetime):
if data.tzinfo is None:
raise ValueError("Naive datetime not allowed")
return data.astimezone(datetime.timezone.utc)
elif isinstance(data, dict):
return {k: encode_with_utc(v) for k, v in data.items()}
elif isinstance(data, list):
return [encode_with_utc(item) for item in data]
return data
Re-apply Timezone on Decode
def decode_with_utc(data):
"""Re-apply UTC timezone to decoded datetimes."""
if isinstance(data, datetime.datetime):
if data.tzinfo is None:
return data.replace(tzinfo=datetime.timezone.utc)
return data
elif isinstance(data, dict):
return {k: decode_with_utc(v) for k, v in data.items()}
elif isinstance(data, list):
return [decode_with_utc(item) for item in data]
return data# Use consistently
decoded = bson.BSON(encoded).decode()
decoded = decode_with_utc(decoded)
Document Behavior
Document in code that BSON datetimes are always UTC and naive after decoding
Please note that all datetime fields in MongoDB are stored as UTC milliseconds.
After decoding, they are naive datetime objects and should be treated as UTC.
- related to
-
SERVER-6439 Duplicate fields at the same level should not be allowed
-
- Backlog
-