Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11647

segfault during rs.stepdown() while background index build in progress on 2.5.3, in a mixed version cluster with 2.4

    XMLWordPrintableJSON

Details

    • ALL

    Description

      i'm not sure exactly how this happened, as it only surfaced after a while of aggressively playing with replset stepdown and background index builds with different index types (hashed, text, etc)
      in this case i tried triggering multiple hashed index builds at once, in the background.
      The 2.5.3 node is the one that crashed, the other nodes were all running 2.4.4.

      log output below. sorry for the dodgy formatting, i had to grab it quickly from my tmux session before it went off the scrollback buffer.

      m31000| Fri Nov  8 17:10:20.343 [conn597] end connection 127.0.0.1:52894 (5 connections now open)
      m31000| Fri Nov  8 17:10:20.343 [initandlisten] connection accepted from 127.0.0.1:52946 #600 (6 connections now open)
      rsrs1.getPrimary().getDB("poo").test.dropIndexes()
      m31001| Fri Nov  8 17:10:20.466 [conn1] CMD: dropIndexes poo.test
      {
      	    "nIndexesWas": 1,
      		  "msg": "non-_id indexes dropped for collection",
      		    "ok": 1
      }
      >  m31002| Fri Nov  8 17:10:20.472 [repl writer worker 1] CMD: dropIndexes poo.test
      m31000| Fri Nov  8 17:10:20.473 [repl writer worker 1] CMD: dropIndexes poo.test
      m27000| 2013-11-08T17:10:20.476-0500 [repl writer worker 1] CMD: dropIndexes poo.test
      kk m31000| Fri Nov  8 17:10:22.419 [conn598] end connection 127.0.0.1:52895 (5 connections now open)
      m31000| Fri Nov  8 17:10:22.419 [initandlisten] connection accepted from 127.0.0.1:52947 #601 (6 connections now open)
      m31002| Fri Nov  8 17:10:25.110 [conn553] end connection 127.0.0.1:52897 (3 connections now open)
      m31002| Fri Nov  8 17:10:25.110 [initandlisten] connection accepted from 127.0.0.1:52949 #556 (4 connections now open)
      m31002| Fri Nov  8 17:10:25.499 [conn554] end connection 127.0.0.1:52900 (3 connections now open)
      m31002| Fri Nov  8 17:10:25.500 [initandlisten] connection accepted from 127.0.0.1:52950 #557 (4 connections now open)
      rsrs1.getPrimary().getDB("poo").test.ensureIndex({x:"text"},{background:true})
      m31001| Fri Nov  8 17:10:26.787 [conn1] build index poo.test { _fts: "text", _ftsx: 1 } background
      m31001| Fri Nov  8 17:10:29.000 [conn1]                Background Index Build Progress: 751300/1010000 74%
      m31001| Fri Nov  8 17:10:29.773 [conn1] build index done.  scanned 1010000 total records. 2.986 secs
      m31001| Fri Nov  8 17:10:29.773 [conn1] insert poo.system.indexes ninserted:1 keyUpdates:0 numYields: 29 locks(micros) w:5928168 2986ms
      m31000| Fri Nov  8 17:10:29.774 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
      m31000| Fri Nov  8 17:10:29.774 [repl writer worker 1] build index poo.test { _fts: "text", _ftsx: 1 }
      >  m31002| Fri Nov  8 17:10:29.774 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
      m31002| Fri Nov  8 17:10:29.774 [repl writer worker 1] build index poo.test { _fts: "text", _ftsx: 1 }
      m31002| Fri Nov  8 17:10:31.249 [repl writer worker 1] build index done.  scanned 1010000 total records. 1.475 secs
      m31000| Fri Nov  8 17:10:31.253 [repl writer worker 1] build index done.  scanned 1010000 total records. 1.479 secs
      m31000| Fri Nov  8 17:10:31.254 [conn580] getmore local.oplog.rs query: { ts: { $gte: Timestamp 1383948437000|1 } } cursorid:22090358664672348 ntoreturn:0 keyUpdates:0 numYields: 1 locks(micros) r:141 nreturned:1 reslen:186 5759ms
      m27000| 2013-11-08T17:10:31.254-0500 [repl index builder 20] build index on: poo.test properties: { "v" : 1, "key" : { "_fts" : "text", "_ftsx" : 1 }, "ns" : "poo.test", "name" : "x_text", "background" : true, "weights" : { "x" : 1 }, "default_language" : "english", "language_override" : "language", "textIndexVersion" : 2 }
      m31001| Fri Nov  8 17:10:33.688 [conn553] end connection 127.0.0.1:52922 (8 connections now open)
      m31001| Fri Nov  8 17:10:33.689 [initandlisten] connection accepted from 127.0.0.1:52952 #556 (9 connections now open)
      m27000| 2013-11-08T17:10:34.000-0500 [repl index builder 20]           Background Index Build Progress: 407300/101000040%
      m31001| Fri Nov  8 17:10:35.859 [conn554] end connection 127.0.0.1:52927 (8 connections now open)
      m31001| Fri Nov  8 17:10:35.859 [initandlisten] connection accepted from 127.0.0.1:52953 #557 (9 connections now open)
      m27000| 2013-11-08T17:10:37.000-0500 [repl index builder 20]           Background Index Build Progress: 852500/101000084%
      rsrs1.getSecondaries().forEach(function(x){printjson(x.getDB("poo").test.getIndexes())})
      [
      {
      "v": 1,
      "key": {
      "_id": 1
      },
      "ns": "poo.test",
      "name": "_id_"
      },
      {
      "v": 1,
      "key": {
      "_fts": "text",
      "_ftsx": 1
      },
      "ns": "poo.test",
      "name": "x_text",
      "background": true,
      "weights": {
      "x": 1
      },
      "default_language": "english",
      "language_override": "language",
      "textIndexVersion": 1
      }
      ]
      [
      {
      "v": 1,
      "key": {
      "_id": 1
      },
      "ns": "poo.test",
      "name": "_id_"
      },
      {
      "v": 1,
      "key": {
      "_fts": "text",
      "_ftsx": 1
      },
      "ns": "poo.test",
      "name": "x_text",
      "background": true,
      "weights": {
      "x": 1
      },
      "default_language": "english",
      "language_override": "language",
      "textIndexVersion": 1
      }
      ]
      [
      {
      "v": 1,
      "key": {
      "_id": 1
      },
      "ns": "poo.test",
      "name": "_id_"
      },
      {
      "v": 1,
      "key": {
      "_fts": "text",
      "_ftsx": 1
      },
      "ns": "poo.test",
      "name": "x_text",
      "background": true,
      "weights": {
      "x": 1
      },
      "default_language": "english",
      "language_override": "language",
      "textIndexVersion": 2
      }
      ]
      >  m27000| 2013-11-08T17:10:38.061-0500 [repl index builder 20] build index done.  scanned 1010000 total records. 6.806 secs
      m31001| Fri Nov  8 17:10:39.360 [conn555] end connection 127.0.0.1:52933 (8 connections now open)
      m27000| 2013-11-08T17:10:39.360-0500 [conn234] end connection 127.0.0.1:52932 (3 connections now open)
      m27000| 2013-11-08T17:10:39.360-0500 [conn233] end connection 127.0.0.1:52934 (3 connections now open)
      m27000| 2013-11-08T17:10:39.360-0500 [conn235] end connection 127.0.0.1:52935 (3 connections now open)
      m31001| Fri Nov  8 17:10:39.361 [initandlisten] connection accepted from 127.0.0.1:52957 #558 (9 connections now open)
      m27000| 2013-11-08T17:10:39.361-0500 [initandlisten] connection accepted from 127.0.0.1:52955 #236 (2 connections now open)
      m27000| 2013-11-08T17:10:39.361-0500 [initandlisten] connection accepted from 127.0.0.1:52954 #237 (3 connections now open)
      m27000| 2013-11-08T17:10:39.362-0500 [initandlisten] connection accepted from 127.0.0.1:52956 #238 (4 connections now open)
      m31000| Fri Nov  8 17:10:41.506 [conn599] end connection 127.0.0.1:52943 (5 connections now open)
      m31000| Fri Nov  8 17:10:41.506 [initandlisten] connection accepted from 127.0.0.1:52958 #602 (6 connections now open)
      >  m31002| Fri Nov  8 17:10:47.691 [conn555] end connection 127.0.0.1:52945 (3 connections now open)
      m31002| Fri Nov  8 17:10:47.692 [initandlisten] connection accepted from 127.0.0.1:52959 #558 (4 connections now open)
      m31000| Fri Nov  8 17:10:50.359 [conn600] end connection 127.0.0.1:52946 (5 connections now open)
      m31000| Fri Nov  8 17:10:50.360 [initandlisten] connection accepted from 127.0.0.1:52960 #603 (6 connections now open)
      m31000| Fri Nov  8 17:10:52.432 [conn601] end connection 127.0.0.1:52947 (5 connections now open)
      m31000| Fri Nov  8 17:10:52.432 [initandlisten] connection accepted from 127.0.0.1:52961 #604 (6 connections now open)
      m31002| Fri Nov  8 17:10:55.125 [conn556] end connection 127.0.0.1:52949 (3 connections now open)
      m31002| Fri Nov  8 17:10:55.126 [initandlisten] connection accepted from 127.0.0.1:52964 #559 (4 connections now open)
      m31002| Fri Nov  8 17:10:55.511 [conn557] end connection 127.0.0.1:52950 (3 connections now open)
      m31002| Fri Nov  8 17:10:55.511 [initandlisten] connection accepted from 127.0.0.1:52965 #560 (4 connections now open)
      m31001| Fri Nov  8 17:11:03.700 [conn556] end connection 127.0.0.1:52952 (8 connections now open)
      m31001| Fri Nov  8 17:11:03.700 [initandlisten] connection accepted from 127.0.0.1:53022 #559 (9 connections now open)
      m31001| Fri Nov  8 17:11:05.875 [conn557] end connection 127.0.0.1:52953 (8 connections now open)
      m31001| Fri Nov  8 17:11:05.876 [initandlisten] connection accepted from 127.0.0.1:53023 #560 (9 connections now open)
      m31001| Fri Nov  8 17:11:09.376 [conn558] end connection 127.0.0.1:52957 (8 connections now open)
      m27000| 2013-11-08T17:11:09.376-0500 [conn238] end connection 127.0.0.1:52956 (3 connections now open)
      m27000| 2013-11-08T17:11:09.376-0500 [conn236] end connection 127.0.0.1:52955 (3 connections now open)
      m27000| 2013-11-08T17:11:09.376-0500 [conn237] end connection 127.0.0.1:52954 (3 connections now open)
      m31001| Fri Nov  8 17:11:09.376 [initandlisten] connection accepted from 127.0.0.1:53027 #561 (9 connections now open)
      m27000| 2013-11-08T17:11:09.376-0500 [initandlisten] connection accepted from 127.0.0.1:53024 #239 (2 connections now open)
      m27000| 2013-11-08T17:11:09.377-0500 [initandlisten] connection accepted from 127.0.0.1:53025 #240 (3 connections now open)
      m27000| 2013-11-08T17:11:09.377-0500 [initandlisten] connection accepted from 127.0.0.1:53026 #241 (4 connections now open)
      m31000| Fri Nov  8 17:11:11.517 [conn602] end connection 127.0.0.1:52958 (5 connections now open)
      m31000| Fri Nov  8 17:11:11.517 [initandlisten] connection accepted from 127.0.0.1:53033 #605 (6 connections now open)
      m31002| Fri Nov  8 17:11:17.706 [conn558] end connection 127.0.0.1:52959 (3 connections now open)
      m31002| Fri Nov  8 17:11:17.706 [initandlisten] connection accepted from 127.0.0.1:53035 #561 (4 connections now open)
      m31000| Fri Nov  8 17:11:20.374 [conn603] end connection 127.0.0.1:52960 (5 connections now open)
      m31000| Fri Nov  8 17:11:20.374 [initandlisten] connection accepted from 127.0.0.1:53036 #606 (6 connections now open)
      m31000| Fri Nov  8 17:11:22.447 [conn604] end connection 127.0.0.1:52961 (5 connections now open)
      m31000| Fri Nov  8 17:11:22.447 [initandlisten] connection accepted from 127.0.0.1:53037 #607 (6 connections now open)
      m31002| Fri Nov  8 17:11:25.140 [conn559] end connection 127.0.0.1:52964 (3 connections now open)
      m31002| Fri Nov  8 17:11:25.140 [initandlisten] connection accepted from 127.0.0.1:53039 #562 (4 connections now open)
      rsrs1.getPrimary().getDB("poo").test.createIndex({x:"hashed"},{background:true})
      m31001| Fri Nov  8 17:11:25.352 [conn1] build index poo.test { x: "hashed" } background
      >  m31002| Fri Nov  8 17:11:25.525 [conn560] end connection 127.0.0.1:52965 (3 connections now open)
      m31002| Fri Nov  8 17:11:25.526 [initandlisten] connection accepted from 127.0.0.1:53041 #563 (4 connections now open)
      m31001| Fri Nov  8 17:11:26.554 [journal] old journal file will be removed: /data/db/test-1/journal/j._2
      m31001| Fri Nov  8 17:11:26.555 [journal] old journal file will be removed: /data/db/test-1/journal/j._3
      m31001| Fri Nov  8 17:11:28.000 [conn1]                Background Index Build Progress: 283500/1010000 28%
      >  m31001| Fri Nov  8 17:11:31.000 [conn1]              Background Index Build Progress: 500000/1010000 49%
      m31001| Fri Nov  8 17:11:33.713 [conn559] end connection 127.0.0.1:53022 (8 connections now open)
      m31001| Fri Nov  8 17:11:33.714 [initandlisten] connection accepted from 127.0.0.1:53044 #562 (9 connections now open)
      m31001| Fri Nov  8 17:11:34.001 [conn1]                Background Index Build Progress: 681500/1010000 67%
      m31001| Fri Nov  8 17:11:35.889 [conn560] end connection 127.0.0.1:53023 (8 connections now open)
      m31001| Fri Nov  8 17:11:35.889 [initandlisten] connection accepted from 127.0.0.1:53046 #563 (9 connections now open)
      m31001| Fri Nov  8 17:11:37.000 [conn1]                Background Index Build Progress: 847000/1010000 83%
      rsrs1.getPrimary().getDB("admin").runCommand({replSetStepDown:null})
      m31001| Fri Nov  8 17:11:39.390 [conn561] end connection 127.0.0.1:53027 (8 connections now open)
      m27000| 2013-11-08T17:11:39.390-0500 [conn239] end connection 127.0.0.1:53024 (3 connections now open)
      m27000| 2013-11-08T17:11:39.390-0500 [conn241] end connection 127.0.0.1:53026 (3 connections now open)
      m27000| 2013-11-08T17:11:39.390-0500 [conn240] end connection 127.0.0.1:53025 (3 connections now open)
      m27000| 2013-11-08T17:11:39.390-0500 [initandlisten] connection accepted from 127.0.0.1:53048 #242 (2 connections now open)
      m31001| Fri Nov  8 17:11:39.390 [initandlisten] connection accepted from 127.0.0.1:53050 #564 (9 connections now open)
      m27000| 2013-11-08T17:11:39.390-0500 [initandlisten] connection accepted from 127.0.0.1:53047 #243 (3 connections now open)
      m27000| 2013-11-08T17:11:39.390-0500 [initandlisten] connection accepted from 127.0.0.1:53049 #244 (4 connections now open)
      m31001| Fri Nov  8 17:11:40.042 [conn1]                Background Index Build Progress: 990500/1010000 98%
      m31001| Fri Nov  8 17:11:40.425 [conn1] build index done.  scanned 1010000 total records. 15.072 secs
      m31001| Fri Nov  8 17:11:40.426 [conn1] insert poo.system.indexes ninserted:1 keyUpdates:0 numYields: 111 locks(micros) w:24721819 15074ms
      m31000| Fri Nov  8 17:11:40.428 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
      m31002| Fri Nov  8 17:11:40.428 [repl writer worker 1] info: indexing in foreground on this replica; was a background index build on the primary
      m31000| Fri Nov  8 17:11:40.431 [repl writer worker 1] build index poo.test { x: "hashed" }
      m31002| Fri Nov  8 17:11:40.431 [repl writer worker 1] build index poo.test { x: "hashed" }
      {
      "closest": NumberLong("1383948629"),
      "difference": NumberLong("71"),
      "ok": 0,
      "errmsg": "no secondaries within 10 seconds of my optime"
      }
      >  m31000| Fri Nov  8 17:11:41.533 [conn605] end connection 127.0.0.1:53033 (5 connections now open)
      m31000| Fri Nov  8 17:11:41.533 [initandlisten] connection accepted from 127.0.0.1:53051 #608 (6 connections now open)
      m31000| Fri Nov  8 17:11:46.905 [repl writer worker 1] old journal file will be removed: /data/db/test-0/journal/j._7
      m31000| Fri Nov  8 17:11:46.910 [repl writer worker 1] old journal file will be removed: /data/db/test-0/journal/j._8
      m31002| Fri Nov  8 17:11:46.921 [repl writer worker 1] build index done.  scanned 1010000 total records. 6.49 secs
      m31000| Fri Nov  8 17:11:46.965 [repl writer worker 1] build index done.  scanned 1010000 total records. 6.534 secs
      m27000| 2013-11-08T17:11:47.024-0500 [repl index builder 21] build index on: poo.test properties: { "v" : 1, "key" : { "x" : "hashed" }, "ns" : "poo.test", "name" : "x_hashed", "background" : true }
      m27000| 2013-11-08T17:11:47.026-0500 [rsBackgroundSync] replSet syncing to: mikes-MacBook-Pro.local:31001
      m31000| Fri Nov  8 17:11:47.026 [conn580] end connection 127.0.0.1:52813 (5 connections now open)
      m31001| Fri Nov  8 17:11:47.026 [initandlisten] connection accepted from 127.0.0.1:53052 #565 (10 connections now open)
      m31001| Fri Nov  8 17:11:47.027 [initandlisten] connection accepted from 127.0.0.1:53053 #566 (11 connections now open)
      m27000| Invalid access at address: 0x50 from thread: rsSyncNotifier
      m27000| Got signal: 11 (Segmentation fault: 11).
       

      Attachments

        Activity

          People

            matt.dannenberg Matt Dannenberg
            mikeo@mongodb.com Michael O'Brien
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: