Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10631

MongoDB periodically consumes excessive system cpu and refuses new connections

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.6
    • Component/s: None
    • Labels:
      None
    • Environment:
      Distributor ID: Ubuntu
      Description: Ubuntu 12.04.2 LTS
      Release: 12.04
      Codename: precise

      5 member replica set (1 member is an arbiter) running on m1.large instances in AWS
    • ALL

      During the course of the day one of our replica-sets will appear to "lock" up on us. These locks occur every few minutes during our higher load times, but not necessarily at any regular interval. From what I've been able to determine they don't appear to be correlated with bursts in traffic or background syncs. Each of these events occur every few minutes and last anywhere between 2 and 5 seconds. During this time we see spikes in system cpu usage and the server does not accept any additional connections.

      I've provided some statistics from one of the most recent events below. We've run tcpdump and determined that while our clients are attempting to connect the server does not accept the connection until after the event has passed. This causes a build up in connection attempts over the course of the event many of which will be closed again upon the event's termination. We have another replica-set that is also running 2.2.6, but doesn't show the same issue.

      mpstat:
      Linux 3.2.0-38-virtual (db-event4.songza.com) 08/27/2013 x86_64 (2 CPU)

      02:52:19 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

      03:00:16 PM all 22.45 0.00 7.65 1.53 0.00 3.57 0.51 0.00 64.29
      03:00:17 PM all 21.72 0.00 10.10 1.52 0.00 2.53 0.51 0.00 63.64
      03:00:18 PM all 18.56 0.00 56.19 0.00 0.00 2.06 0.00 0.00 23.20
      03:00:19 PM all 15.58 0.00 83.92 0.00 0.00 0.50 0.00 0.00 0.00
      03:00:20 PM all 11.50 0.00 87.50 0.00 0.00 1.00 0.00 0.00 0.00
      03:00:21 PM all 9.95 0.00 89.05 0.00 0.00 1.00 0.00 0.00 0.00
      03:00:22 PM all 6.53 0.00 92.96 0.00 0.00 0.50 0.00 0.00 0.00
      03:00:23 PM all 8.00 0.00 92.00 0.00 0.00 0.00 0.00 0.00 0.00
      03:00:24 PM all 28.06 0.00 42.35 13.27 0.00 3.06 0.51 0.00 12.76
      03:00:25 PM all 53.47 0.00 27.72 0.50 0.00 5.94 0.50 0.00 11.88
      03:00:26 PM all 49.48 0.00 23.20 2.06 0.00 6.70 0.00 0.00 18.56
      03:00:27 PM all 47.55 0.00 25.00 1.96 0.00 2.45 0.49 0.00 22.55

      mongostat:
      269 60 724 0 1091 776 0 882g 1766g 5.86g 0 play_history:29.7% 0 0|0 1|1 299k 361k 2401 event PRI 15:00:16
      279 44 797 0 1186 800 0 882g 1766g 5.88g 0 play_history:28.9% 0 0|0 0|1 320k 366k 2401 event PRI 15:00:17
      — discontinuity —
      1132 205 2261 0 894 2183 0 882g 1766g 5.86g 0 play_history:32.4% 0 1|0 0|1 732k 507k 2402 event PRI 15:00:24
      372 111 1935 0 766 1474 0 882g 1766g 5.98g 0 local:10.2% 0 1498|0 0|2 554k 1m 2429 event PRI 15:00:25
      791 47 1425 0 953 2026 0 882g 1766g 6.02g 0 play_history:81.5% 0 846|0 0|2 646k 790k 2410 event PRI 15:00:26
      503 46 1478 0 1075 1997 0 882g 1766g 5.94g 0 play_history:86.3% 0 262|0 0|2 660k 716k 2376 event PRI 15:00:27

      sample connection attempt:
      15:00:21.011703 IP ip-10-38-246-165.ec2.internal.50594 > ip-10-139-28-208.ec2.internal.27017: Flags [S], seq 648754701, win 14600, options [mss 1460,sackOK,TS val 648330510 ecr 0,nop,wscale 7], length 0
      15:00:22.009600 IP ip-10-38-246-165.ec2.internal.50594 > ip-10-139-28-208.ec2.internal.27017: Flags [S], seq 648754701, win 14600, options [mss 1460,sackOK,TS val 648330760 ecr 0,nop,wscale 7], length 0
      15:00:24.013566 IP ip-10-38-246-165.ec2.internal.50594 > ip-10-139-28-208.ec2.internal.27017: Flags [S], seq 648754701, win 14600, options [mss 1460,sackOK,TS val 648331261 ecr 0,nop,wscale 7], length 0
      15:00:24.013581 IP ip-10-139-28-208.ec2.internal.27017 > ip-10-38-246-165.ec2.internal.50594: Flags [S.], seq 597737666, ack 648754702, win 14480, options [mss 1460,sackOK,TS val 953554509 ecr 648331261,nop,wscale 7], length 0
      15:00:24.024828 IP ip-10-38-246-165.ec2.internal.50594 > ip-10-139-28-208.ec2.internal.27017: Flags [F.], seq 1, ack 1, win 115, options [nop,nop,TS val 648331263 ecr 953554509], length 0
      15:00:24.024996 IP ip-10-139-28-208.ec2.internal.27017 > ip-10-38-246-165.ec2.internal.50594: Flags [F.], seq 1, ack 2, win 114, options [nop,nop,TS val 953554512 ecr 648331263], length 0
      15:00:25.561343 IP ip-10-195-130-15.ec2.internal.50594 > ip-10-139-28-208.ec2.internal.27017: Flags [S], seq 1016537720, win 14600, options [mss 1460,sackOK,TS val 649068642 ecr 0,nop,wscale 7], length 0
      15:00:25.561361 IP ip-10-139-28-208.ec2.internal.27017 > ip-10-195-130-15.ec2.internal.50594: Flags [S.], seq 2809357988, ack 1016537721, win 14480, options [mss 1460,sackOK,TS val 953554896 ecr 649068642,nop,wscale 7], length 0

            Assignee:
            Unassigned Unassigned
            Reporter:
            michael@songza.com Michael Henson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: