Uploaded image for project: 'PHP Legacy Driver'
  1. PHP Legacy Driver
  2. PHP-1468

Segfault on node with high network latency

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 1.6.10
    • Fix Version/s: 1.6.14
    • Component/s: pecl-mongo
    • Labels:
    • Environment:
      OS Ubuntu 14.04.2
      mongodb 3.0.5 deployed as sharded cluster
    • # Replies:
      5
    • Last comment by Customer:
      true
    • Sprint:
      PHP Sprint 23, PHP Sprint 24, PHP Sprint 25

      Description

      We have 3 nodes in sharded cluster one of which got high network latency about 5 - 200 ms, other two less then 1ms. Code used socket timeout = 20 ms which couse constant node blacklisting and connection recreation. Sometimes this leads to:

      #0  __GI_____strtol_l_internal (nptr=0x1 <error: Cannot access memory at address 0x1>, endptr=endptr@entry=0x0, base=base@entry=10, group=group@entry=0, loc=0x7f9392c0a060 <_nl_global_locale>)
          at ../stdlib/strtol_l.c:298
      #1  0x00007f93928884a2 in __GI_strtol (nptr=<optimized out>, endptr=endptr@entry=0x0, base=base@entry=10) at ../stdlib/strtol.c:108
      #2  0x00007f938cd3d49e in atoi (__nptr=<optimized out>) at /usr/include/stdlib.h:280
      #3  mongo_server_hash_to_pid (hash=<optimized out>) at /tmp/pear/temp/mongo/mcon/utils.c:174
      #4  0x00007f938cd3bd63 in filter_connections (manager=manager@entry=0x1a44e50, types=types@entry=31, rp=0x7fff4b4c3c80) at /tmp/pear/temp/mongo/mcon/read_preference.c:94
      #5  0x00007f938cd3c804 in mongo_rp_collect_any (rp=0x7fff4b4c3c80, manager=0x1a44e50) at /tmp/pear/temp/mongo/mcon/read_preference.c:141
      #6  mongo_find_all_candidate_servers (rp=0x7fff4b4c3c80, manager=0x1a44e50) at /tmp/pear/temp/mongo/mcon/read_preference.c:163
      #7  mongo_find_candidate_servers (manager=manager@entry=0x1a44e50, rp=rp@entry=0x7fff4b4c3c80, servers=servers@entry=0x1b9fff0) at /tmp/pear/temp/mongo/mcon/read_preference.c:337
      #8  0x00007f938cd37ef1 in mongo_get_connection_multiple (manager=manager@entry=0x1a44e50, servers=servers@entry=0x1b9fff0, connection_flags=connection_flags@entry=9,
          error_message=error_message@entry=0x7fff4b4c3df8) at /tmp/pear/temp/mongo/mcon/manager.c:478
      #9  0x00007f938cd38bde in mongo_get_read_write_connection (manager=0x1a44e50, servers=0x1b9fff0, connection_flags=9, error_message=error_message@entry=0x7fff4b4c3df8)
          at /tmp/pear/temp/mongo/mcon/manager.c:553
      #10 0x00007f938cd0f8d2 in php_mongo_connect (link=<optimized out>, flags=<optimized out>) at /tmp/pear/temp/mongo/mongoclient.c:357
      #11 0x00007f938cd100ae in php_mongo_ctor (ht=<optimized out>, return_value=<optimized out>, return_value_ptr=<optimized out>, this_ptr=<optimized out>, return_value_used=<optimized out>,
          bc=<optimized out>) at /tmp/pear/temp/mongo/mongoclient.c:571
      

      I trying to debug and think this caused by some race condition in mongo_connection recycling.

      In filter_connections i added mongo_manager_log before mongo_server_hash_to_pid and found that on every segfault we got "broken" mongo_connection memory for our node with high network latency:

      PHP Notice:  REPLSET WARN: filter_connections: con:ping_ms 117835526, con:connection_type 117835526, con:connected 117835526, con::hash (null) ; current pid (29226)
       
      PHP Notice:  REPLSET WARN: filter_connections: con:ping_ms 4, con:connection_type 16, con:connected 1, con::hash (null) ; current pid (22912) 
       
      PHP Notice:  REPLSET WARN: filter_connections: con:ping_ms 12, con:connection_type 30379568, con:connected 0, con::hash (null) ; current pid (23825)
      

      Disabling this node, fixed this issue.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:
                Days since reply:
                3 years, 50 weeks, 1 day ago
                Date of 1st Reply: