Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2383

Mongod craches when killing pid running on kernel 2.6.32-5-xen-amd64

    • Type: Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 1.6.5
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      Xen DomU Guest running Debian Squeeze (kernel 2.6.32-5-xen-amd64)
      MongoDB 1.6.5 downloaded from the website as binary
      Configured mongo with replSet
    • Linux

      Mongo is running fine and when I do
      kill $(cat /mongo/db/mongod.lock)

      It sometimes (1 out of 4 ) seems to cause a kernel panic.
      I did some testing and it only seems to occur when adding the mongo to a replica set cluster with the replSet option.

      This is the stack trace:

      [58625.873310] alignment check: 0000 1 SMP
      [58625.873317] last sysfs file: /sys/devices/virtual/net/lo/operstate
      [58625.873320] CPU 0
      [58625.873323] Modules linked in: snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev xfs exportfs xen_netfront xen_blkfront
      [58625.873336] Pid: 8539, comm: mongod Not tainted 2.6.32-5-xen-amd64 #1
      [58625.873339] RIP: e030:[<ffffffff81270c0b>] [<ffffffff81270c0b>] eth_type_trans+0x3d/0xae
      [58625.873347] RSP: e02b:ffff880001c93988 EFLAGS: 00050246
      [58625.873350] RAX: ffff88002efd20fc RBX: ffff88002e3b12e8 RCX: ffff88002efd20ee
      [58625.873354] RDX: 0000000000000042 RSI: 000000000000000e RDI: ffff88002e3b12e8
      [58625.873357] RBP: ffff88002fc3e800 R08: 0000000000000000 R09: 0000000000000000
      [58625.873361] R10: 000000000000000e R11: ffffffff8125fbaf R12: ffff88002e3a2080
      [58625.873364] R13: ffff88002fc3e800 R14: ffff88002fdea980 R15: ffffffff81350270
      [58625.873371] FS: 00007ff239953710(0000) GS:ffff8800031ac000(0000) knlGS:0000000000000000
      [58625.873375] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
      [58625.873378] CR2: 000000000080a45c CR3: 0000000001001000 CR4: 0000000000002660
      [58625.873382] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [58625.873385] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [58625.873389] Process mongod (pid: 8539, threadinfo ffff880001c92000, task ffff88002eab2350)
      [58625.873392] Stack:
      [58625.873394] 0000000000000000 ffff88002fc3e800 ffff88002e3b12e8 ffffffff812398d0
      [58625.873399] <0> 0000000000000000 ffff88002e3b12e8 ffff88002e3a2080 ffffffff8125f9e4
      [58625.873407] <0> ffffffff8100ecdf 0000000000000000 ffff88002fdea980 ffff88002e3a2080
      [58625.873414] Call Trace:
      [58625.873418] [<ffffffff812398d0>] ? loopback_xmit+0x36/0x7a
      [58625.873422] [<ffffffff8125f9e4>] ? dev_hard_start_xmit+0x211/0x2db
      [58625.873428] [<ffffffff8100ecdf>] ? xen_restore_fl_direct_end+0x0/0x1
      [58625.873432] [<ffffffff8125fe8c>] ? dev_queue_xmit+0x2dd/0x38d
      [58625.873437] [<ffffffff81287483>] ? ip_queue_xmit+0x311/0x386
      [58625.873487] [<ffffffffa004744d>] ? xfs_log_release_iclog+0x10/0x38 [xfs]
      [58625.873498] [<ffffffffa00515f5>] ? _xfs_trans_commit+0x25f/0x2d1 [xfs]
      [58625.873502] [<ffffffff8100e63d>] ? xen_force_evtchn_callback+0x9/0xa
      [58625.873507] [<ffffffff81297e33>] ? tcp_transmit_skb+0x648/0x687
      [58625.873511] [<ffffffff8100ecf2>] ? check_events+0x12/0x20
      [58625.873515] [<ffffffff8129a2b5>] ? tcp_write_xmit+0x874/0x96c
      [58625.873518] [<ffffffff8129a3fa>] ? __tcp_push_pending_frames+0x22/0x53
      [58625.873523] [<ffffffff8128d7fd>] ? tcp_close+0x176/0x3d0
      [58625.873528] [<ffffffff812aa2f8>] ? inet_release+0x4e/0x54
      [58625.873533] [<ffffffff81251121>] ? sock_release+0x19/0x66
      [58625.873536] [<ffffffff81251190>] ? sock_close+0x22/0x26
      [58625.873541] [<ffffffff810f09c9>] ? __fput+0x100/0x1af
      [58625.873545] [<ffffffff810ede06>] ? filp_close+0x5b/0x62
      [58625.873549] [<ffffffff810508a0>] ? put_files_struct+0x64/0xc1
      [58625.873553] [<ffffffff8105215d>] ? do_exit+0x22e/0x6c6
      [58625.873557] [<ffffffff81052165>] ? do_exit+0x236/0x6c6
      [58625.873560] [<ffffffff8105266b>] ? do_group_exit+0x76/0x9d
      [58625.873565] [<ffffffff8105eef7>] ? get_signal_to_deliver+0x310/0x339
      [58625.873570] [<ffffffff8101104f>] ? do_notify_resume+0x87/0x73f
      [58625.873573] [<ffffffff8100b444>] ? xen_write_msr_safe+0x76/0xb1
      [58625.873577] [<ffffffff810106c4>] ? __switch_to+0x1ad/0x297
      [58625.873582] [<ffffffff81049045>] ? finish_task_switch+0x44/0xaf
      [58625.873586] [<ffffffff81011e0e>] ? int_signal+0x12/0x17
      [58625.873588] Code: 87 d8 00 00 00 2b 87 d0 00 00 00 be 0e 00 00 00 89 87 c4 00 00 00 e8 68 48 fe ff 8b 8b c4 00 00 00 48 03 8b d0 00 00 00 f6 01 01 <48> 8b 11 74 20 48 33 95 40 02 00 00 8a 43 7d 48 c1 e2 10 75 08
      [58625.873630] RIP [<ffffffff81270c0b>] eth_type_trans+0x3d/0xae
      [58625.873634] RSP <ffff880001c93988>
      [58625.873639] --[ end trace f73fe61a27c51fab ]--
      [58625.873641] Kernel panic - not syncing: Fatal exception in interrupt
      [58625.873645] Pid: 8539, comm: mongod Tainted: G D 2.6.32-5-xen-amd64 #1
      [58625.873648] Call Trace:
      [58625.873652] [<ffffffff8130ac81>] ? panic+0x86/0x143
      [58625.873657] [<ffffffff8130cb3a>] ? _spin_unlock_irqrestore+0xd/0xe
      [58625.873661] [<ffffffff8100ecdf>] ? xen_restore_fl_direct_end+0x0/0x1
      [58625.873664] [<ffffffff8130cb3a>] ? _spin_unlock_irqrestore+0xd/0xe
      [58625.873668] [<ffffffff8104f3af>] ? release_console_sem+0x17e/0x1af
      [58625.873672] [<ffffffff8130d9d5>] ? oops_end+0xa7/0xb4
      [58625.873676] [<ffffffff81013416>] ? do_alignment_check+0x88/0x92
      [58625.873680] [<ffffffff8125fbaf>] ? dev_queue_xmit+0x0/0x38d
      [58625.873685] [<ffffffff811f1976>] ? HYPERVISOR_event_channel_op+0x11/0x50
      [58625.873695] [<ffffffffa004d6f9>] ? xfs_icsb_modify_counters+0x7b/0x1a0 [xfs]
      [58625.873699] [<ffffffff81012a75>] ? alignment_check+0x25/0x30
      [58625.873703] [<ffffffff8125fbaf>] ? dev_queue_xmit+0x0/0x38d
      [58625.873706] [<ffffffff81270c0b>] ? eth_type_trans+0x3d/0xae
      [58625.873710] [<ffffffff81270bfb>] ? eth_type_trans+0x2d/0xae
      [58625.873713] [<ffffffff812398d0>] ? loopback_xmit+0x36/0x7a
      [58625.873717] [<ffffffff8125f9e4>] ? dev_hard_start_xmit+0x211/0x2db
      [58625.873721] [<ffffffff8100ecdf>] ? xen_restore_fl_direct_end+0x0/0x1
      [58625.873724] [<ffffffff8125fe8c>] ? dev_queue_xmit+0x2dd/0x38d
      [58625.873728] [<ffffffff81287483>] ? ip_queue_xmit+0x311/0x386
      [58625.873738] [<ffffffffa004744d>] ? xfs_log_release_iclog+0x10/0x38 [xfs]
      [58625.873747] [<ffffffffa00515f5>] ? _xfs_trans_commit+0x25f/0x2d1 [xfs]
      [58625.873752] [<ffffffff8100e63d>] ? xen_force_evtchn_callback+0x9/0xa
      [58625.873755] [<ffffffff81297e33>] ? tcp_transmit_skb+0x648/0x687
      [58625.873759] [<ffffffff8100ecf2>] ? check_events+0x12/0x20
      [58625.873762] [<ffffffff8129a2b5>] ? tcp_write_xmit+0x874/0x96c
      [58625.873766] [<ffffffff8129a3fa>] ? __tcp_push_pending_frames+0x22/0x53
      [58625.873770] [<ffffffff8128d7fd>] ? tcp_close+0x176/0x3d0
      [58625.873773] [<ffffffff812aa2f8>] ? inet_release+0x4e/0x54
      [58625.873777] [<ffffffff81251121>] ? sock_release+0x19/0x66
      [58625.873780] [<ffffffff81251190>] ? sock_close+0x22/0x26
      [58625.873784] [<ffffffff810f09c9>] ? __fput+0x100/0x1af
      [58625.873787] [<ffffffff810ede06>] ? filp_close+0x5b/0x62
      [58625.873791] [<ffffffff810508a0>] ? put_files_struct+0x64/0xc1
      [58625.873794] [<ffffffff8105215d>] ? do_exit+0x22e/0x6c6
      [58625.873797] [<ffffffff81052165>] ? do_exit+0x236/0x6c6
      [58625.873801] [<ffffffff8105266b>] ? do_group_exit+0x76/0x9d
      [58625.873804] [<ffffffff8105eef7>] ? get_signal_to_deliver+0x310/0x339
      [58625.873808] [<ffffffff8101104f>] ? do_notify_resume+0x87/0x73f
      [58625.873812] [<ffffffff8100b444>] ? xen_write_msr_safe+0x76/0xb1
      [58625.873815] [<ffffffff810106c4>] ? __switch_to+0x1ad/0x297
      [58625.873819] [<ffffffff81049045>] ? finish_task_switch+0x44/0xaf
      [58625.873822] [<ffffffff81011e0e>] ? int_signal+0x12/0x17

            Assignee:
            eliot Eliot Horowitz (Inactive)
            Reporter:
            netdata Wouter D'Haeseleer
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: