Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-21858

A high throughput update workload in a replica set can cause starvation of secondary reads

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.7, 3.2.0
    • Component/s: Replication, WiredTiger
    • None
    • Storage Execution
    • Fully Compatible
    • ALL
    • Hide

      Launch mongod's:

      ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs0 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --logpath /mnt/work/data/rs0/mongod.log --fork
      ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs1 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --port 27018 --logpath /mnt/work/data/rs1/mongod.log --fork
      

      Setup the replset:

      rs.initiate()
      rs.add("localhost:27018")
      

      Run javascript against the primary to seed documents

      var INT_MAX=10000000;
      for(x=0;x<INT_MAX/1000;x++){
      	var bulk = db.test.initializeUnorderedBulkOp()
      	for(i=0;i<1000;i++){
      		var num = (x*1000)+i;
      		bulk.insert({_id:num,x:num,y:num});
      	}
      	bulk.execute();
      }
      

      Execute the attached repro.rb

      Run the following javascript on the secondary to monitor performance. You should see an initially stable number of query times, which will then blow out to seconds and minutes:

      while(true){
      var INT_MAX=10000000;
      start = new Date()
      rs.slaveOk()
      for(i=0;i<10000;i++){
      	var docNum = Math.floor(Math.random()*INT_MAX);
      	db.test.findOne({_id:docNum});
      }
      print("Took: " +(new Date() - start) + "ms")
      sleep(1000);
      }
      
      Show
      Launch mongod's: ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs0 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --logpath /mnt/work/data/rs0/mongod.log --fork ./mongod --replSet foo --storageEngine wiredTiger --dbpath /mnt/work/data/rs1 --oplogSize 1000 --wiredTigerCacheSizeGB 4 --port 27018 --logpath /mnt/work/data/rs1/mongod.log --fork Setup the replset: rs.initiate() rs.add( "localhost:27018" ) Run javascript against the primary to seed documents var INT_MAX=10000000; for (x=0;x<INT_MAX/1000;x++){ var bulk = db.test.initializeUnorderedBulkOp() for (i=0;i<1000;i++){ var num = (x*1000)+i; bulk.insert({_id:num,x:num,y:num}); } bulk.execute(); } Execute the attached repro.rb Run the following javascript on the secondary to monitor performance. You should see an initially stable number of query times, which will then blow out to seconds and minutes: while ( true ){ var INT_MAX=10000000; start = new Date() rs.slaveOk() for (i=0;i<10000;i++){ var docNum = Math .floor( Math .random()*INT_MAX); db.test.findOne({_id:docNum}); } print( "Took: " +( new Date() - start) + "ms" ) sleep(1000); }

      Under an update only workload on the primary it is possible to starve out readers on a secondary and cause large replication delays.

      Workload:
      On Primary:

      1. 1000 small updates (integer sets, increments and an unset)
      2. 10 massive (500kb) updates

      On Secondary:

      1. 10000 findOne's

        1. 307.html
          4.90 MB
          David Hows
        2. 320.html
          4.90 MB
          David Hows
        3. lag.png
          251 kB
          Bruce Lucas
        4. repro.rb
          0.9 kB
          David Hows
        5. ss-singlethread.html
          5.03 MB
          David Hows

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            david.hows David Hows
            Votes:
            3 Vote for this issue
            Watchers:
            21 Start watching this issue

              Created:
              Updated:
              Resolved: