Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21858

A high throughput update workload in a replica set can cause starvation of secondary reads

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.0.7, 3.2.0
    • Fix Version/s: None
    • Component/s: Replication, WiredTiger
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Launch mongod's:

      ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs0 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --logpath /mnt/work/data/rs0/mongod.log --fork
      ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs1 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --port 27018 --logpath /mnt/work/data/rs1/mongod.log --fork
      

      Setup the replset:

      rs.initiate()
      rs.add("localhost:27018")
      

      Run javascript against the primary to seed documents

      var INT_MAX=10000000;
      for(x=0;x<INT_MAX/1000;x++){
      	var bulk = db.test.initializeUnorderedBulkOp()
      	for(i=0;i<1000;i++){
      		var num = (x*1000)+i;
      		bulk.insert({_id:num,x:num,y:num});
      	}
      	bulk.execute();
      }
      

      Execute the attached repro.rb

      Run the following javascript on the secondary to monitor performance. You should see an initially stable number of query times, which will then blow out to seconds and minutes:

      while(true){
      var INT_MAX=10000000;
      start = new Date()
      rs.slaveOk()
      for(i=0;i<10000;i++){
      	var docNum = Math.floor(Math.random()*INT_MAX);
      	db.test.findOne({_id:docNum});
      }
      print("Took: " +(new Date() - start) + "ms")
      sleep(1000);
      }
      

      Show
      Launch mongod's: ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs0 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --logpath /mnt/work/data/rs0/mongod.log --fork ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs1 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --port 27018 --logpath /mnt/work/data/rs1/mongod.log --fork Setup the replset: rs.initiate() rs.add("localhost:27018") Run javascript against the primary to seed documents var INT_MAX=10000000; for(x=0;x<INT_MAX/1000;x++){ var bulk = db.test.initializeUnorderedBulkOp() for(i=0;i<1000;i++){ var num = (x*1000)+i; bulk.insert({_id:num,x:num,y:num}); } bulk.execute(); } Execute the attached repro.rb Run the following javascript on the secondary to monitor performance. You should see an initially stable number of query times, which will then blow out to seconds and minutes: while(true){ var INT_MAX=10000000; start = new Date() rs.slaveOk() for(i=0;i<10000;i++){ var docNum = Math.floor(Math.random()*INT_MAX); db.test.findOne({_id:docNum}); } print("Took: " +(new Date() - start) + "ms") sleep(1000); }

      Description

      Under an update only workload on the primary it is possible to starve out readers on a secondary and cause large replication delays.

      Workload:
      On Primary:

      1. 1000 small updates (integer sets, increments and an unset)
      2. 10 massive (500kb) updates

      On Secondary:

      1. 10000 findOne's

        Attachments

        1. 307.html
          4.90 MB
        2. 320.html
          4.90 MB
        3. lag.png
          lag.png
          251 kB
        4. repro.rb
          0.9 kB
        5. ss-singlethread.html
          5.03 MB

          Issue Links

            Activity

              People

              • Votes:
                3 Vote for this issue
                Watchers:
                21 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: