[SERVER-20757] Replica set secondaries stop replicating during reindex concurrency workload Created: 05/Oct/15  Updated: 14/Apr/16  Resolved: 05/Oct/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kamran K. Assignee: Benety Goh
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File secondary_stacks.txt    
Issue Links:
Duplicate
duplicates SERVER-3850 reIndex on primary does not propagate... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

I'm able to repro this bug with a DEBUG build of master on OS X (1fd64cee88562e77883db5b75ee666a55b15e748). I haven't tried a non-DEBUG build yet.


Steps to repro:

1 - Apply this patch to master:

diff --git a/jstests/concurrency/fsm_all_replication.js b/jstests/concurrency/fsm_all_replication.js
index 919d915..12f3c3a 100644
--- a/jstests/concurrency/fsm_all_replication.js
+++ b/jstests/concurrency/fsm_all_replication.js
@@ -14,5 +14,5 @@ var blacklist = [
 ].map(function(file) { return dir + '/' + file; });
 
 runWorkloadsSerially(ls(dir).filter(function(file) {
-    return !Array.contains(blacklist, file);
+    return !Array.contains(blacklist, file) && file.indexOf('reindex.js') > -1;
 }), { replication: true });
diff --git a/jstests/concurrency/fsm_workloads/reindex.js b/jstests/concurrency/fsm_workloads/reindex.js
index 51aad94..070a117 100644
--- a/jstests/concurrency/fsm_workloads/reindex.js
+++ b/jstests/concurrency/fsm_workloads/reindex.js
@@ -105,7 +105,7 @@ var $config = (function() {
 
     return {
         threadCount: 15,
-        iterations: 10,
+        iterations: 100,
         states: states,
         transitions: transitions,
         teardown: teardown,

2 - Run `python buildscripts/resmoke.py --suites=concurrency_replication`


Log excerpt:

[js_test:fsm_all_replication] 2015-10-05T12:13:06.259-0400 d20012| 2015-10-05T12:13:06.259-0400 I INDEX    [repl writer worker 12] build index done.  scanned 1000 total records. 0 secs
[js_test:fsm_all_replication] 2015-10-05T12:13:06.276-0400 d20011| 2015-10-05T12:13:06.275-0400 I INDEX    [repl writer worker 11] build index done.  scanned 1000 total records. 0 secs
 
^^^^ These are the last messages from the secondaries even though the primary is still performing work:
 
....
[js_test:fsm_all_replication] 2015-10-05T12:21:33.569-0400 d20010| 2015-10-05T12:21:33.569-0400 I COMMAND  [conn20] CMD: reIndex db0.reindex_14
[js_test:fsm_all_replication] 2015-10-05T12:21:33.619-0400 d20010| 2015-10-05T12:21:33.618-0400 I INDEX    [conn20] build index on: db0.reindex_14 properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "db0.reindex_14" }
[js_test:fsm_all_replication] 2015-10-05T12:21:33.619-0400 d20010| 2015-10-05T12:21:33.618-0400 I INDEX    [conn20] 	 building index using bulk method
[js_test:fsm_all_replication] 2015-10-05T12:21:33.629-0400 d20010| 2015-10-05T12:21:33.629-0400 I INDEX    [conn20] build index on: db0.reindex_14 properties: { v: 1, key: { _fts: "text", _ftsx: 1 }, name: "text_text", ns: "db0.reindex_14", weights: { text: 1 }, default_language: "english", language_override: "language", textIndexVersion: 3 }
....



 Comments   
Comment by Kamran K. [ 05/Oct/15 ]

Actually, I think this is "works as designed" because reIndex commands are not performed on secondaries (SERVER-3850). The secondaries are not producing any log output, but they may just be idle because the primary is performing queries and reIndex commands.

Generated at Thu Feb 08 03:55:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.