Uploaded image for project: 'Python Driver'
  1. Python Driver
  2. PYTHON-192

document behaviour wrt Unicode

    • Type: Icon: Task Task
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Environment:
      Python (platform-independent)

      In July 2009, Michael Dirolf wrote about the behaviour regarding strings/Unicode strings of the Python driver.

      "This issue is a bit tricky because of Python's (at least Python < 3.0) string vsunicodevs binary data handling. The reason the driver converts everything tounicodeis because the database stores utf-8. So when you save a regular string it is assumed to be utf-8 data and saved as such. When you save aunicodeinstance it is encoded to utf-8 and saved. On the decoding end we then just convert everything back tounicodebecause that is sort of the lowest common denominator for the different datatypes that could have been saved originally. "

      As far as I can tell this is not explicitly documented? In the PyMongo v1.9+ documentation there is no mention of this behaviour - or am I overlooking something?
      Proposal: document this clearly in the tutorial. The examples clearly show that you enter normal Python text strings but get Unicode strings back from the database,
      but a sentence or two to emphasize it would be in place.

            Assignee:
            bernie@mongodb.com Bernie Hackett
            Reporter:
            npoppeli Nico Poppelier
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: