Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-30939

Multiple index builds on same collection can livelock on WriteConflictExceptions

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.7, 3.5.10
    • Component/s: Index Maintenance
    • Labels:
      None
    • ALL
    • v4.0, v3.6
    • Hide

      Setup:

      mlaunch init --replicaset --nodes 2 --arbiter --port 12345 --binarypath /sw/mongodb-unstable/3.5.10/bin --bind_ip 0.0.0.0
      

      Test (intermittent, may require several/many attempts):

      mongo --port 12345 --eval 'db.test.drop(); db.test.insert({})' ; for i in {1..63}; do mongo --port 12345 --eval "db.test.ensureIndex({a$i:1},{background:true})" & done
      
      Show
      Setup: mlaunch init --replicaset --nodes 2 --arbiter --port 12345 --binarypath /sw/mongodb-unstable/3.5.10/bin --bind_ip 0.0.0.0 Test (intermittent, may require several/many attempts): mongo --port 12345 --eval 'db.test.drop(); db.test.insert({})' ; for i in {1..63}; do mongo --port 12345 --eval "db.test.ensureIndex({a$i:1},{background: true })" & done
    • Storage NYC 2019-04-22, Storage NYC 2019-05-06, Storage NYC 2019-05-20
    • 15

      When building indexes in the background, the IndexBuilder will restart the entire index build process on catching a WriteConflictException. Depending on where the WriteConflictException is thrown from (MultiIndexBlock::insertAllDocuments(), MultiIndexBlock::commit(), we could restart the part of the index build process that failed.

      ======================= OLD DESCRIPTION BELOW ===================

      Sometimes, when building numerous background indexes (in the same collection) on a secondary, the mongod gets into a state where 1 or more of the "repl index builder" threads loop indefinitely, rebuilding their bg index over and over again.

      This does not happen every time, sometimes I need to run the test repeatedly (up to a dozen times) to trigger the problem.

      2017-09-04T07:09:04.815+0000 I COMMAND  [repl writer worker 1] CMD: drop test.test
      2017-09-04T07:09:04.815+0000 I STORAGE  [repl writer worker 1] dropCollection: test.test - renaming to drop-pending collection: test.system.drop.1504508944i1t1.test with drop optime { ts: Timestamp 1504508944000|1, t: 1 }
      2017-09-04T07:09:04.950+0000 I REPL     [replication-1] Completing collection drop for test.system.drop.1504508944i1t1.test with drop optime { ts: Timestamp 1504508944000|1, t: 1 } (notification optime: { ts: Timestamp 1504508944000|3, t: 1 })
      2017-09-04T07:09:05.151+0000 I INDEX    [repl index builder 1] build index on: test.test properties: { v: 2, key: { a1: 1.0 }, name: "a1_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.152+0000 I INDEX    [repl index builder 1] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.160+0000 I INDEX    [repl index builder 2] build index on: test.test properties: { v: 2, key: { a2: 1.0 }, name: "a2_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.160+0000 I INDEX    [repl index builder 2] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.178+0000 I INDEX    [repl index builder 3] build index on: test.test properties: { v: 2, key: { a3: 1.0 }, name: "a3_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.179+0000 I INDEX    [repl index builder 3] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.187+0000 I INDEX    [repl index builder 4] build index on: test.test properties: { v: 2, key: { a4: 1.0 }, name: "a4_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.188+0000 I INDEX    [repl index builder 4] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.196+0000 I INDEX    [repl index builder 3] build index on: test.test properties: { v: 2, key: { a3: 1.0 }, name: "a3_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.196+0000 I INDEX    [repl index builder 3] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.204+0000 I INDEX    [repl index builder 4] build index on: test.test properties: { v: 2, key: { a4: 1.0 }, name: "a4_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.204+0000 I INDEX    [repl index builder 4] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.217+0000 I INDEX    [repl index builder 5] build index on: test.test properties: { v: 2, key: { a5: 1.0 }, name: "a5_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.227+0000 I INDEX    [repl index builder 3] build index on: test.test properties: { v: 2, key: { a3: 1.0 }, name: "a3_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.228+0000 I INDEX    [repl index builder 5] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.228+0000 I INDEX    [repl index builder 3] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.242+0000 I INDEX    [repl index builder 4] build index on: test.test properties: { v: 2, key: { a4: 1.0 }, name: "a4_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.250+0000 I INDEX    [repl index builder 5] build index on: test.test properties: { v: 2, key: { a5: 1.0 }, name: "a5_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.257+0000 I INDEX    [repl index builder 3] build index on: test.test properties: { v: 2, key: { a3: 1.0 }, name: "a3_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.257+0000 I INDEX    [repl index builder 4] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.265+0000 I INDEX    [repl index builder 6] build index on: test.test properties: { v: 2, key: { a6: 1.0 }, name: "a6_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.265+0000 I INDEX    [repl index builder 3] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.265+0000 I INDEX    [repl index builder 5] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.266+0000 I INDEX    [repl index builder 6] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.283+0000 I INDEX    [repl index builder 4] build index on: test.test properties: { v: 2, key: { a4: 1.0 }, name: "a4_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.293+0000 I INDEX    [repl index builder 5] build index on: test.test properties: { v: 2, key: { a5: 1.0 }, name: "a5_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.293+0000 I INDEX    [repl index builder 4] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.300+0000 I INDEX    [repl index builder 6] build index on: test.test properties: { v: 2, key: { a6: 1.0 }, name: "a6_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.300+0000 I INDEX    [repl index builder 5] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.308+0000 I INDEX    [repl index builder 7] build index on: test.test properties: { v: 2, key: { a7: 1.0 }, name: "a7_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.308+0000 I INDEX    [repl index builder 6] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.316+0000 I INDEX    [repl index builder 4] build index on: test.test properties: { v: 2, key: { a4: 1.0 }, name: "a4_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.316+0000 I INDEX    [repl index builder 7] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.329+0000 I INDEX    [repl index builder 5] build index on: test.test properties: { v: 2, key: { a5: 1.0 }, name: "a5_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.329+0000 I INDEX    [repl index builder 4] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.345+0000 I INDEX    [repl index builder 6] build index on: test.test properties: { v: 2, key: { a6: 1.0 }, name: "a6_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.345+0000 I INDEX    [repl index builder 5] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.352+0000 I INDEX    [repl index builder 7] build index on: test.test properties: { v: 2, key: { a7: 1.0 }, name: "a7_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.352+0000 I INDEX    [repl index builder 6] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.361+0000 I INDEX    [repl index builder 4] build index on: test.test properties: { v: 2, key: { a4: 1.0 }, name: "a4_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.361+0000 I INDEX    [repl index builder 7] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.368+0000 I INDEX    [repl index builder 8] build index on: test.test properties: { v: 2, key: { a8: 1.0 }, name: "a8_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.376+0000 I INDEX    [repl index builder 5] build index on: test.test properties: { v: 2, key: { a5: 1.0 }, name: "a5_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.376+0000 I INDEX    [repl index builder 4] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.377+0000 I INDEX    [repl index builder 8] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.392+0000 I INDEX    [repl index builder 6] build index on: test.test properties: { v: 2, key: { a6: 1.0 }, name: "a6_1", ns: "test.test", background: true }
      2017-09-04T07:09:05.392+0000 I INDEX    [repl index builder 5] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:05.402+0000 I INDEX    [repl index builder 7] build index on: test.test properties: { v: 2, key: { a7: 1.0 }, name: "a7_1", ns: "test.test", background: true }
      ...
      

      Full logs are attached, along with a backtrace from the secondary immediately before building the indexes, and while affected by this issue (for these logs). These are from the Ubuntu 14.04 SSL build (community) of 3.5.10. I've also seen it happen in the Ubuntu 16.04 SSL build of 3.4.7.

      Usually once this starts happening, it continues ad infinitum (I've waited up to ~15 mins). But I've also seen it spontaneously resolve itself, and the index builds stop (unfortunately I didn't keep the logs from one of these runs).

      Usually shutdown will interrupt these index builds:

      2017-09-04T07:09:15.662+0000 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
      2017-09-04T07:09:15.663+0000 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
      2017-09-04T07:09:15.663+0000 I NETWORK  [signalProcessingThread] shutdown: going to flush diaglog...
      2017-09-04T07:09:15.663+0000 I REPL     [signalProcessingThread] shutting down replication subsystems
      2017-09-04T07:09:15.663+0000 I REPL     [signalProcessingThread] Stopping replication reporter thread
      2017-09-04T07:09:15.663+0000 I INDEX    [repl index builder 8] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:15.663+0000 I INDEX    [repl index builder 12] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:15.663+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to devkev-1:12345: CallbackCanceled: Reporter no longer valid
      2017-09-04T07:09:15.663+0000 I REPL     [signalProcessingThread] Stopping replication fetcher thread
      2017-09-04T07:09:15.663+0000 I REPL     [signalProcessingThread] Stopping replication applier thread
      2017-09-04T07:09:15.663+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending connection to host devkev-1:12345 due to bad connection status; 1 connections to that host remain open
      2017-09-04T07:09:15.663+0000 I REPL     [rsBackgroundSync] Replication producer stopped after oplog fetcher finished returning a batch from our sync source.  Abandoning this batch of oplog entries and re-evaluating our sync source.
      2017-09-04T07:09:15.663+0000 I INDEX    [repl index builder 24] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:15.676+0000 I INDEX    [repl index builder 25] build index on: test.test properties: { v: 2, key: { a25: 1.0 }, name: "a25_1", ns: "test.test", background: true }
      2017-09-04T07:09:15.685+0000 I INDEX    [repl index builder 35] build index on: test.test properties: { v: 2, key: { a35: 1.0 }, name: "a35_1", ns: "test.test", background: true }
      2017-09-04T07:09:15.693+0000 I INDEX    [repl index builder 13] build index on: test.test properties: { v: 2, key: { a13: 1.0 }, name: "a13_1", ns: "test.test", background: true }
      ...
      2017-09-04T07:09:17.031+0000 I INDEX    [repl index builder 11] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:17.031+0000 I INDEX    [repl index builder 31] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:17.031+0000 I INDEX    [repl index builder 18] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:17.031+0000 I INDEX    [repl index builder 23] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:17.039+0000 I INDEX    [repl index builder 32] build index on: test.test properties: { v: 2, key: { a32: 1.0 }, name: "a32_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.039+0000 I INDEX    [repl index builder 46] build index done.  scanned 1 total records. 0 secs
      2017-09-04T07:09:17.042+0000 I REPL     [signalProcessingThread] Stopping replication storage threads
      2017-09-04T07:09:17.043+0000 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
      2017-09-04T07:09:17.048+0000 I INDEX    [repl index builder 19] build index on: test.test properties: { v: 2, key: { a19: 1.0 }, name: "a19_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.056+0000 I INDEX    [repl index builder 34] build index on: test.test properties: { v: 2, key: { a33: 1.0 }, name: "a33_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.063+0000 I INDEX    [repl index builder 8] build index on: test.test properties: { v: 2, key: { a8: 1.0 }, name: "a8_1", ns: "test.test", background: true }
      ...
      2017-09-04T07:09:17.171+0000 I INDEX    [repl index builder 10] build index on: test.test properties: { v: 2, key: { a10: 1.0 }, name: "a10_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.180+0000 I INDEX    [repl index builder 25] build index on: test.test properties: { v: 2, key: { a25: 1.0 }, name: "a25_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.187+0000 I INDEX    [repl index builder 35] build index on: test.test properties: { v: 2, key: { a35: 1.0 }, name: "a35_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.187+0000 E INDEX    [repl index builder 14] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.187+0000 E INDEX    [repl index builder 7] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.187+0000 E INDEX    [repl index builder 36] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.196+0000 I INDEX    [repl index builder 13] build index on: test.test properties: { v: 2, key: { a13: 1.0 }, name: "a13_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.196+0000 E INDEX    [repl index builder 26] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.196+0000 E INDEX    [repl index builder 21] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.205+0000 I INDEX    [repl index builder 20] build index on: test.test properties: { v: 2, key: { a20: 1.0 }, name: "a20_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.212+0000 I INDEX    [repl index builder 38] build index on: test.test properties: { v: 2, key: { a38: 1.0 }, name: "a38_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.212+0000 E INDEX    [repl index builder 5] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.212+0000 E INDEX    [repl index builder 39] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.220+0000 I INDEX    [repl index builder 28] build index on: test.test properties: { v: 2, key: { a28: 1.0 }, name: "a28_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.220+0000 E INDEX    [repl index builder 9] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.234+0000 I INDEX    [repl index builder 6] build index on: test.test properties: { v: 2, key: { a6: 1.0 }, name: "a6_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.234+0000 E INDEX    [repl index builder 4] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.242+0000 I INDEX    [repl index builder 41] build index on: test.test properties: { v: 2, key: { a41: 1.0 }, name: "a41_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.242+0000 E INDEX    [repl index builder 29] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.242+0000 E INDEX    [repl index builder 42] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.242+0000 E INDEX    [repl index builder 16] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.243+0000 E INDEX    [repl index builder 44] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.251+0000 I INDEX    [repl index builder 11] build index on: test.test properties: { v: 2, key: { a11: 1.0 }, name: "a11_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.259+0000 I INDEX    [repl index builder 31] build index on: test.test properties: { v: 2, key: { a31: 1.0 }, name: "a31_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.268+0000 I INDEX    [repl index builder 18] build index on: test.test properties: { v: 2, key: { a18: 1.0 }, name: "a18_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.276+0000 I INDEX    [repl index builder 23] build index on: test.test properties: { v: 2, key: { a22: 1.0 }, name: "a22_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.276+0000 E INDEX    [repl index builder 32] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.285+0000 I INDEX    [repl index builder 46] build index on: test.test properties: { v: 2, key: { a46: 1.0 }, name: "a46_1", ns: "test.test", background: true }
      2017-09-04T07:09:17.285+0000 E INDEX    [repl index builder 34] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.285+0000 E INDEX    [repl index builder 19] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.285+0000 E INDEX    [repl index builder 8] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.285+0000 E INDEX    [repl index builder 12] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.285+0000 E INDEX    [repl index builder 24] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.285+0000 E INDEX    [repl index builder 37] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.286+0000 E INDEX    [repl index builder 27] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.286+0000 E INDEX    [repl index builder 40] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.286+0000 E INDEX    [repl index builder 17] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.286+0000 E INDEX    [repl index builder 15] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.287+0000 E INDEX    [repl index builder 30] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.287+0000 E INDEX    [repl index builder 43] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.287+0000 E INDEX    [repl index builder 22] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.287+0000 E INDEX    [repl index builder 45] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 33] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 10] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 25] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 35] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 13] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 38] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 20] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 28] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 41] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 6] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 11] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 23] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 31] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.288+0000 E INDEX    [repl index builder 46] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.289+0000 E INDEX    [repl index builder 18] IndexBuilder could not build index: InterruptedAtShutdown: interrupted at shutdown
      2017-09-04T07:09:17.289+0000 I STORAGE  [signalProcessingThread] WiredTigerKVEngine shutting down
      2017-09-04T07:09:21.298+0000 I STORAGE  [signalProcessingThread] shutdown: removing fs lock...
      2017-09-04T07:09:21.298+0000 I CONTROL  [signalProcessingThread] now exiting
      2017-09-04T07:09:21.298+0000 I CONTROL  [signalProcessingThread] shutting down with code:0
      

      But I have also seen it interfere with shutdown and prevent the repl writer threads from being stopped (the mongod had already stopped listening on its TCP port):

      2017-09-04T00:44:12.690+0000 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
      2017-09-04T00:44:12.690+0000 I NETWORK  [signalProcessingThread] shutdown: going to close listening sockets...
      2017-09-04T00:44:12.690+0000 I NETWORK  [signalProcessingThread] closing listening socket: 7
      2017-09-04T00:44:12.692+0000 I INDEX    [repl index builder 57] build index on: test.test properties: { v: 2, key: { a57: 1.0 }, name: "a57_1", ns: "test.test", background: true }
      2017-09-04T00:44:12.692+0000 I NETWORK  [signalProcessingThread] closing listening socket: 8
      2017-09-04T00:44:12.692+0000 I NETWORK  [signalProcessingThread] removing socket file: /tmp/mongodb-12346.sock
      2017-09-04T00:44:12.692+0000 I NETWORK  [signalProcessingThread] shutdown: going to flush diaglog...
      2017-09-04T00:44:12.692+0000 I REPL     [signalProcessingThread] shutting down replication subsystems
      2017-09-04T00:44:12.692+0000 I REPL     [signalProcessingThread] Stopping replication reporter thread
      2017-09-04T00:44:12.692+0000 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to devkev-1:12345: CallbackCanceled: Reporter no longer valid
      2017-09-04T00:44:12.692+0000 I REPL     [signalProcessingThread] Stopping replication fetcher thread
      2017-09-04T00:44:12.692+0000 I REPL     [signalProcessingThread] Stopping replication applier thread
      2017-09-04T00:44:12.692+0000 I ASIO     [NetworkInterfaceASIO-RS-0] Ending connection to host devkev-1:12345 due to bad connection status; 1 connections to that host remain open
      2017-09-04T00:44:12.692+0000 I REPL     [rsBackgroundSync] Replication producer stopped after oplog fetcher finished returning a batch from our sync source.  Abandoning this batch of oplog entries and re-evaluating our sync source.
      2017-09-04T00:44:12.765+0000 I -        [conn12] end connection 172.31.0.248:34953 (2 connections now open)
      2017-09-04T00:44:12.846+0000 I INDEX    [repl index builder 48] build index on: test.test properties: { v: 2, key: { a48: 1.0 }, name: "a48_1", ns: "test.test", background: true }
      2017-09-04T00:44:13.177+0000 I INDEX    [repl index builder 27] build index on: test.test properties: { v: 2, key: { a27: 1.0 }, name: "a27_1", ns: "test.test", background: true }
      ...
      2017-09-04T01:00:05.331+0000 I INDEX    [repl index builder 57] build index on: test.test properties: { v: 2, key: { a57: 1.0 }, name: "a57_1", ns: "test.test", background: true }
      2017-09-04T01:00:05.369+0000 I INDEX    [repl index builder 40] build index on: test.test properties: { v: 2, key: { a40: 1.0 }, name: "a40_1", ns: "test.test", background: true }
      2017-09-04T01:00:05.407+0000 I INDEX    [repl index builder 27] build index on: test.test properties: { v: 2, key: { a27: 1.0 }, name: "a27_1", ns: "test.test", background: true }
      2017-09-04T01:00:05.407+0000 I INDEX    [repl index builder 42] build index done.  scanned 1 total records. 0 secs
      2017-09-04T01:00:05.443+0000 I INDEX    [repl index builder 39] build index on: test.test properties: { v: 2, key: { a39: 1.0 }, name: "a39_1", ns: "test.test", background: true }
      2017-09-04T01:00:05.443+0000 I INDEX    [repl index builder 36] build index done.  scanned 1 total records. 0 secs
      2017-09-04T01:00:05.443+0000 I INDEX    [repl index builder 31] build index done.  scanned 1 total records. 0 secs
      <kill -9 of mongod>
      

      In this case, I had also noticed that the secondary was stuck in the middle of applying the batch — replication was not making progress (repllag of the secondary would not increase if I did ops on the primary).

      I've attached logs and a few backtraces from the mongod while in this state (using the 3.4.7 Ubuntu 14.04 SSL build; I also have the whole core if necessary).

      Affected versions:

      3.2.12 unaffected
      3.2.16 unaffected
      3.4.0 AFFECTED
      3.4.7 AFFECTED
      3.4.8-rc1 AFFECTED
      3.5.1 AFFECTED
      3.5.6 AFFECTED
      3.5.9 AFFECTED
      3.5.10 AFFECTED
      3.5.11 unaffected
      3.5.12 unaffected

      Looking at the commits between 3.5.10 and 3.5.11, nothing stands out as being relevant. So this may have been fixed tangentially as part of some other work.

        1. backtrace-before.txt
          49 kB
          Kevin Pulo
        2. backtrace-cannot-shutdown-1.txt
          85 kB
          Kevin Pulo
        3. backtrace-cannot-shutdown-2.txt
          84 kB
          Kevin Pulo
        4. backtrace-cannot-shutdown-3.txt
          84 kB
          Kevin Pulo
        5. backtrace-cannot-shutdown-4.txt
          83 kB
          Kevin Pulo
        6. backtrace-during.txt
          87 kB
          Kevin Pulo
        7. mongod-cannot-shutdown-primary.log
          71 kB
          Kevin Pulo
        8. mongod-cannot-shutdown-secondary.log.xz
          428 kB
          Kevin Pulo
        9. mongod-primary.log
          214 kB
          Kevin Pulo
        10. mongod-secondary.log
          489 kB
          Kevin Pulo

            Assignee:
            benety.goh@mongodb.com Benety Goh
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            0 Vote for this issue
            Watchers:
            15 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'slack.nextup.jira:slack-integration-plus'. Please contact your Jira administrators.