Four vulnerabilities in BSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: 3.11, 4.15.3
    • Component/s: BSON
    • 🟡 Potential Risk
    • Python Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Vulnerability report

      The system says that I cannot file a vulnerability, so I chose bug.

      Test Environment

      • Python Version:3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
      • Library: Linux 6.8.0-85-generic (Ubuntu/Debian)
      • Library: pymongo (bson package)
      • Tested and verified on PyMongo Versions:
        • PyMongo 3.11 (system-installed,)
        • PyMongo 4.15.3
      • Date: 2025-11-06

      Executive Summary

      The Python BSON library (pymongo) has 4 significant security concerns identified during security testing. More precisely:

      • 2 HIGH severity (security risks)
      • 2 MEDIUM severity (spec violations & data integrity)

      1. Duplicate Keys Accepted (HIGH)

      Description

      The BSON specification explicitly forbids duplicate keys in documents, but the Python implementation silently accepts them, taking the last value without warning.

      Proof of Concept

      import bson
      part1 = b'\x02a\x00\x02\x00\x00\x001\x00'  # a: "1"
      part2 = b'\x02a\x00\x02\x00\x00\x002\x00'  # a: "2"
      malicious_raw = b'\x17\x00\x00\x00' + part1 + part2 + b'\x00'decoded = bson.BSON(malicious_raw).decode()
      print(decoded)  # {'a': '2'} - silently accepts, last entry wins

      Impact

      • Severity: HIGH
      • Type: Specification Violation / Security Risk
      • Attack Vectors:
        • Key injection attacks: The attacker can override security-critical keys
        • Validation bypasses: First key passes validation, second key is used
        • Database inconsistencies: Different BSON implementations handle differently
        • Access control bypass: Override permission flags by duplicating keys

      Real-World Scenario

      # Application validates first 'role' key
      data_from_user = craft_bson({
          'role': 'user',      # Passes validation
          'role': 'admin'      # Actually used by system
      })
      # User gets admin access!

      Mitigation Recommendations

      Check for the existence of duplicate keys.

      Validation Layer

      def validate_no_duplicate_keys(bson_bytes):
          """Validate BSON has no duplicate keys by parsing manually."""
          decoded = bson.BSON(bson_bytes).decode()
          # Re-encode and check size - duplicates would make it larger
          re_encoded = bson.BSON.encode(decoded)    if len(bson_bytes) != len(re_encoded):
              raise ValueError("Duplicate keys detected in BSON")    return decoded

      Manual Key Tracking

      def safe_decode_bson(bson_bytes):
          """Decode BSON while tracking for duplicate keys."""
          seen_keys = set()
          # Parse BSON manually to track keys
          # (requires low-level BSON parsing - complex but thorough)
          pass

      2. Malformed BSON Structure Accepted (HIGH)

      Description

      The BSON decoder accepts invalid byte structures that should be rejected according to the BSON specification. This is a critical security concern.

      Proof of Concept

      import bson
      raw = b'\x05\x00\x00\x00\x00'  # Claims 5 bytes, but structure is invalid
      decoded = bson.BSON(raw).decode()  # Should fail, but succeeds
      print(decoded)  # Output: {}

      Impact

      • Severity: HIGH
      • Type: Security Risk / Parser Confusion
      • Attack Vectors:
        • Parser confusion attacks: Different parsers interpret malformed length differently
        • Security boundary bypass: Validation on one system, execution on another
        • Buffer overflow potential: Incorrect length fields can cause memory issues

      Real-World Scenario

      1. Client sends malformed BSON with incorrect length field to web API
      2. Python validator "accepts" it (silently parses as empty document)
      3. Passes to MongoDB/C++ backend which may interpret differently
      4. Parser disagreement leads to unexpected behavior
      5. Potential for data corruption or security bypass

      Mitigation Recommendations

      Perform strict validation.

      def strict_bson_decode(bson_bytes):
          """Decode BSON with strict validation."""
          if len(bson_bytes) < 5:
              raise ValueError("BSON document too short")    # Check declared length matches actual length
          declared_length = int.from_bytes(bson_bytes[:4], 'little')
          if declared_length != len(bson_bytes):
              raise ValueError(f"BSON length mismatch: declared {declared_length}, actual {len(bson_bytes)}")    # Check null terminator
          if bson_bytes[-1] != 0:
              raise ValueError("BSON document not null-terminated")    return bson.BSON(bson_bytes).decode()

      3. NaN and Infinity Accepted (MEDIUM)

      Description

      The BSON specification does not officially support NaN (Not a Number) or Infinity values, but the Python implementation accepts them without error.

      Proof of Concept

      # All of these succeed when they should fail
      import bson
      bson.BSON.encode({"value": float("nan")})      # NaN
      bson.BSON.encode({"value": float("inf")})      # Positive Infinity
      bson.BSON.encode({"value": float("-inf")})     # Negative Infinity

      Impact

      • Severity: MEDIUM
      • Type: Specification Violation / Interoperability Issue
      • Problems:
        •  Interoperability issues: Other BSON implementations may reject or mishandle
        •  MongoDB query issues: NaN behaves unexpectedly in comparisons
        •  Data validation bypasses: NaN != NaN breaks equality checks
        •  Sorting anomalies: NaN and Infinity break ordering

       

      Real-World Issues

      Query Problems

      # Store NaN in database (Python accepts it)
      db.collection.insert_one({"score": float("nan")})# Query fails or behaves unexpectedly
      db.collection.find({"score": {"$gt": 0)  # NaN is not > 0, not < 0, not == 0}}

      Validation Bypass

      def validate_score(score):
          if score < 0 or score > 100:
              raise ValueError("Score must be 0-100")
          return score }}{{# NaN bypasses validation (NaN < 0 is False, NaN > 100 is False)
      validate_score(float("nan"))  # Passes!

      Mitigation Recommendations

      Pre-Encoding Validation

      import mathdef validate_no_special_floats(data):
          if isinstance(data, float):
              if math.isnan(data) or math.isinf(data):
                  raise ValueError(f"Invalid float value: {data}")
          elif isinstance(data, dict):
              for value in data.values():
                  validate_no_special_floats(value)
          elif isinstance(data, (list, tuple)):
              for item in data:
                  validate_no_special_floats(item)
          return datadata = {"score": user_input}{}
      validate_no_special_floats(data)
      bson.BSON.encode(data)

      MongoDB Schema Validation

      db.createCollection("scores", {
         validator: {
            $jsonSchema: {
               bsonType: "object",
               properties: {
                  score: {
                     bsonType: "double",
                     minimum: 0,
                     maximum: 100

      {\{            }

      }}
               }

      {\{      }

      }}
         }
      })

      4. Timezone Information Loss (MEDIUM)

      Description

      When encoding datetime objects to BSON, timezone information is lost during the decode process, resulting in naive datetime objects.

      Proof of Concept

      import bson
      import datetimenow = datetime.datetime(2023, 1, 1, 12, 0, 0, tzinfo=datetime.timezone.utc)
      print(now.tzinfo)  # datetime.timezone.utc
      encoded = bson.BSON.encode({"ts": now})
      decoded = bson.BSON(encoded).decode()
      print(decoded["ts"].tzinfo)  # None (lost!)

      Impact

      • Severity: MEDIUM
      • Type: Data Integrity / Data Loss
      • Problems:
        • Incorrect time calculations across timezones
        • Silent bugs that are hard to detect
        • Data corruption in timezone-sensitive applications
        • Compliance issues for applications requiring timezone tracking

      Real-World Scenario

      # User schedules event at 3 PM EST
      event_time = datetime.datetime(2023, 6, 1, 15, 0, tzinfo=EST)# Store in MongoDB via BSON
      db.events.insert_one({"time": event_time})# Retrieve in PST timezone application
      retrieved = db.events.find_one()
      event_time_back = retrieved["time"]  # Naive datetime!# Application assumes local timezone (PST)
      # Event now appears at with a wrong time!

      Mitigation Recommendations

      Always Store as UTC

      import datetimedef encode_with_utc(data):
          """Convert all datetimes to UTC before encoding."""
          if isinstance(data, datetime.datetime):
              if data.tzinfo is None:
                  raise ValueError("Naive datetime not allowed")
              return data.astimezone(datetime.timezone.utc)
          elif isinstance(data, dict):
              return {k: encode_with_utc(v) for k, v in data.items()}
          elif isinstance(data, list):
              return [encode_with_utc(item) for item in data]
          return data

      Re-apply Timezone on Decode

      def decode_with_utc(data):
          """Re-apply UTC timezone to decoded datetimes."""
          if isinstance(data, datetime.datetime):
              if data.tzinfo is None:
                  return data.replace(tzinfo=datetime.timezone.utc)
              return data
          elif isinstance(data, dict):
              return {k: decode_with_utc(v) for k, v in data.items()}
          elif isinstance(data, list):
              return [decode_with_utc(item) for item in data]
          return data# Use consistently
      decoded = bson.BSON(encoded).decode()
      decoded = decode_with_utc(decoded)

      Document Behavior

      Document in code that BSON datetimes are always UTC and naive after decoding

      Please note that all datetime fields in MongoDB are stored as UTC milliseconds.
      After decoding, they are naive datetime objects and should be treated as UTC.

            Assignee:
            Noah Stapp
            Reporter:
            Constantinos Patsakis
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: