Investigate ways to improve performance of small read operations

XMLWordPrintableJSON

    • Python Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      Running a simple command like database.command('ping') shows that about 35% of the time is spent in acquiring a server and a connection. This limits the ability to use the driver for things like throughput tests.

      I created this gist to demonstrate the overhead.

      Before patching client._select_server, I see the following times for 10000 pings:

      command: 1.12
      _command: 0.77

      After patching client._select_server, I see:

      command: 0.86
      _command: 0.77

      I then ran cProfile against 100000 iterations of the non-patched command call to verify where the time was spent. _select_server took 15% of the runtime and checking out a connection took about 20% of the runtime, while db._command was 65% of the runtime.

      For single-threaded driver operations, only about 25% of the runtime is spent on actual network I/O. Most of the rest is spent on the overhead of managing connections, monitoring, context managers, and other background tasks. Since every operation passes through many layers of function calls before getting to the actual socket I/O, any gain from concurrent execution is limited by this high amount of overhead. This isn't necessarily a problem by itself, but suggests that reduction of these layers could improve performance. The async API in particular with all of its await statements that yield control could see significant speedup if many of those unnecessary await statements are removed

      Definition of done

      A one-week timeboxed proof-of-concept to see how much performance improvement we can gain from removing as many intermediate layers and context managers as possible from network operations. Collapsing our call stack to minimize the number of calls and contexts each operation has to manage should be the primary objective.

      Pitfalls

      We want to ensure that we remain spec-compliant and thread safe.

        1. ping_test_v2.py
          1 kB
        2. profile.svg
          41 kB
        3. Screenshot 2025-06-09 at 10.43.06 AM.png
          Screenshot 2025-06-09 at 10.43.06 AM.png
          8 kB
        4. Screenshot 2025-06-09 at 10.43.37 AM.png
          Screenshot 2025-06-09 at 10.43.37 AM.png
          9 kB
        5. Screenshot 2025-06-09 at 3.41.12 PM.png
          Screenshot 2025-06-09 at 3.41.12 PM.png
          98 kB
        6. Screenshot 2025-06-09 at 3.41.45 PM.png
          Screenshot 2025-06-09 at 3.41.45 PM.png
          123 kB
        7. single-profiled.svg
          168 kB
        8. threaded-profiled.svg
          88 kB

              Assignee:
              Noah Stapp
              Reporter:
              Steve Silvester
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: