Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20757

Replica set secondaries stop replicating during reindex concurrency workload

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL

      I'm able to repro this bug with a DEBUG build of master on OS X (1fd64cee88562e77883db5b75ee666a55b15e748). I haven't tried a non-DEBUG build yet.


      Steps to repro:

      1 - Apply this patch to master:

      diff --git a/jstests/concurrency/fsm_all_replication.js b/jstests/concurrency/fsm_all_replication.js
      index 919d915..12f3c3a 100644
      --- a/jstests/concurrency/fsm_all_replication.js
      +++ b/jstests/concurrency/fsm_all_replication.js
      @@ -14,5 +14,5 @@ var blacklist = [
       ].map(function(file) { return dir + '/' + file; });
       
       runWorkloadsSerially(ls(dir).filter(function(file) {
      -    return !Array.contains(blacklist, file);
      +    return !Array.contains(blacklist, file) && file.indexOf('reindex.js') > -1;
       }), { replication: true });
      diff --git a/jstests/concurrency/fsm_workloads/reindex.js b/jstests/concurrency/fsm_workloads/reindex.js
      index 51aad94..070a117 100644
      --- a/jstests/concurrency/fsm_workloads/reindex.js
      +++ b/jstests/concurrency/fsm_workloads/reindex.js
      @@ -105,7 +105,7 @@ var $config = (function() {
       
           return {
               threadCount: 15,
      -        iterations: 10,
      +        iterations: 100,
               states: states,
               transitions: transitions,
               teardown: teardown,
      

      2 - Run `python buildscripts/resmoke.py --suites=concurrency_replication`


      Log excerpt:

      [js_test:fsm_all_replication] 2015-10-05T12:13:06.259-0400 d20012| 2015-10-05T12:13:06.259-0400 I INDEX    [repl writer worker 12] build index done.  scanned 1000 total records. 0 secs
      [js_test:fsm_all_replication] 2015-10-05T12:13:06.276-0400 d20011| 2015-10-05T12:13:06.275-0400 I INDEX    [repl writer worker 11] build index done.  scanned 1000 total records. 0 secs
      
      ^^^^ These are the last messages from the secondaries even though the primary is still performing work:
      
      ....
      [js_test:fsm_all_replication] 2015-10-05T12:21:33.569-0400 d20010| 2015-10-05T12:21:33.569-0400 I COMMAND  [conn20] CMD: reIndex db0.reindex_14
      [js_test:fsm_all_replication] 2015-10-05T12:21:33.619-0400 d20010| 2015-10-05T12:21:33.618-0400 I INDEX    [conn20] build index on: db0.reindex_14 properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "db0.reindex_14" }
      [js_test:fsm_all_replication] 2015-10-05T12:21:33.619-0400 d20010| 2015-10-05T12:21:33.618-0400 I INDEX    [conn20] 	 building index using bulk method
      [js_test:fsm_all_replication] 2015-10-05T12:21:33.629-0400 d20010| 2015-10-05T12:21:33.629-0400 I INDEX    [conn20] build index on: db0.reindex_14 properties: { v: 1, key: { _fts: "text", _ftsx: 1 }, name: "text_text", ns: "db0.reindex_14", weights: { text: 1 }, default_language: "english", language_override: "language", textIndexVersion: 3 }
      ....
      

            Assignee:
            benety.goh@mongodb.com Benety Goh
            Reporter:
            kamran.khan Kamran K.
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: