Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18200

Long running queries on secondary causes replication to fall behind

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.0.2
    • Component/s: Concurrency, Replication
    • Labels:
      None
    • ALL
    • Hide
      • Create replicaset with a 2.6.9 PRIMARY and a 3.0.2 SECONDARY
      • Import attached collection which contains ~500k documents
        mongoimport  -d test -c test_collection --jsonArray large-test-collection.json --host $PRIMARY --port $PRIMARY_PORT
        
      • In a mongo shell on the PRIMARY run a loop to insert a bunch of test data
        for (var i = 1; i <= 1000000; i++) {    db.testData.insert( { x : i } ) }
        
      • At the same time run a mongostat on the SECONDARY and you should see inserts flowing in at around 2k inserts/sec replicating from the PRIMARY
        insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn      set repl       time 
         *2142     *0     *0     *0       0     2|0       0   800m  4.03g   337m    112 oplog:0.0%          0       0|0     0|0   205b    10k     3 dynatest  SEC   23:11:24 
         *2154     *0     *0     *0       0     1|0       0   800m  4.03g   337m    115 oplog:0.0%          0       0|0     0|0    62b    10k     3 dynatest  SEC   23:11:25 
         *2218     *0     *0     *0       0     2|0       0   800m  4.03g   338m    116 oplog:0.0%          0       0|0     0|0   205b    10k     3 dynatest  SEC   23:11:26 
         *2166     *0     *0     *0       0     1|0       0   800m  4.03g   339m    116 oplog:0.0%          0       0|0     0|0    62b    10k     3 dynatest  SEC   23:11:27 
         *2217     *0     *0     *0       0     2|0       0   800m  4.03g   339m    117 oplog:0.0%          0       0|0     0|1   205b    10k     3 dynatest  SEC   23:11:28 
         *2172     *0     *0     *0       0     1|0       0   800m  4.03g   340m    113 oplog:0.0%          0       0|0     0|0    62b    10k     3 dynatest  SEC   23:11:29
        
      • Kick off a 4-6 second long running query on the SECONDARY
         db.test_collection.find({latitude:  { $gt: 0 }}).limit(200000).sort({_created_at: -1})
        
      • Notice that the mongostat shows replication has halted and the SECONDARY is no longer applying inserts from the PRIMARY until the long running query returns
        insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn      set repl       time 
         *2196     *0     *0     *0       0     1|0       0   800m  4.13g   644m    115 oplog:0.0%          0       0|0     0|0    62b    10k     3 dynatest  SEC   23:14:49 
         *2124     *0     *0     *0       0     2|0       0   800m  4.13g   645m    113 oplog:0.0%          0       0|0     0|0   205b    10k     3 dynatest  SEC   23:14:50 
         *2180     *0     *0     *0       0     1|0       0   800m  4.13g   645m    114 oplog:0.0%          0       0|0     0|0    62b    10k     3 dynatest  SEC   23:14:51 
         *1191      1     *0     *0       0     2|0       0   800m  4.13g   648m     63 oplog:0.0%          0       0|0     1|0   205b    10k     3 dynatest  SEC   23:14:52 
            *0     *0     *0     *0       0     1|0       0   800m  4.15g   669m      0 oplog:0.0%          0       0|0     1|0    62b    10k     3 dynatest  SEC   23:14:53 
            *0     *0     *0     *0       0     2|0       0   800m  4.17g   683m      0 oplog:0.0%          0       0|0     1|0   205b    10k     3 dynatest  SEC   23:14:54 
            *0     *0     *0     *0       0     1|0       0   800m  4.17g   683m      0 oplog:0.0%          0       0|0     1|0    62b    10k     3 dynatest  SEC   23:14:55 
            *0     *0     *0     *0       0     2|0       0   800m  4.17g   684m      0 oplog:0.0%          0       0|0     1|0   205b    10k     3 dynatest  SEC   23:14:56 
            *0     *0     *0     *0       0     1|0       0   800m  4.17g   703m      0 oplog:0.0%          0       0|0     1|0    62b    10k     3 dynatest  SEC   23:14:57 
        *10006     *0     *0     *0       0     3|0       0   800m  4.18g   717m    528 oplog:0.0%          0       0|0     1|0   413b     4m     3 dynatest  SEC   23:14:58 
         *5278     *0     *0     *0       0     1|0       0   800m  4.18g   692m    279 oplog:0.0%          0       0|0     0|0    62b    10k     3 dynatest  SEC   23:14:59
        

      Now, you can perform the same setup with a 2.6.9 PRIMARY and 2.6.9 SECONDARY. When running the same long running query on the SECONDARY, mongostat shows inserts do slow down when the long running query is executed, but does not completely block all replication like it did running 3.0.2.

      insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn      set repl       time 
       *1982     *0     *0     *0       0     2|0       0   800m  4.08g   223m      0  test:8.4%          0       0|0     0|0   205b     3k     3 dynatest  SEC   23:21:32 
       *1969      1     *0     *0       0     1|0       0   800m  4.08g   224m      0  test:6.5%          0       0|0     1|0    62b     3k     3 dynatest  SEC   23:21:33 
         *83     *0     *0     *0       0     2|0       0   800m  4.09g   244m      0  test:1.3%          0       0|0     1|0   205b     3k     3 dynatest  SEC   23:21:34 
        *640     *0     *0     *0       0     1|0       0   800m  4.11g   269m      0  test:1.6%          0       0|0     1|0    62b     3k     3 dynatest  SEC   23:21:35 
       *1525     *0     *0     *0       0     2|0       0   800m  4.11g   270m      0  test:3.7%          0       0|0     1|0   205b     3k     3 dynatest  SEC   23:21:36 
       *2237     *0     *0     *0       0     1|0       0   800m  4.12g   271m      0  test:5.2%          0       0|0     1|0    62b     3k     3 dynatest  SEC   23:21:37 
       *4406     *0     *0     *0       0     2|0       0   800m  4.12g   288m      0 test:10.3%          0       0|0     1|0   205b     3k     3 dynatest  SEC   23:21:38 
       *4010     *0     *0     *0       0     2|0       0   800m  4.12g   297m      0 test:14.1%          0       0|0     0|0   279b     4m     3 dynatest  SEC   23:21:39 
       *2181     *0     *0     *0       0     2|0       0   800m  4.12g   297m      0  test:8.7%          0       0|0     0|0   205b     3k     3 dynatest  SEC   23:21:40 
       *2146     *0     *0     *0       0     1|0       0   800m  4.12g   277m      0  test:7.9%          0       0|0     0|0    62b     3k     3 dynatest  SEC   23:21:41
      
      Show
      Create replicaset with a 2.6.9 PRIMARY and a 3.0.2 SECONDARY Import attached collection which contains ~500k documents mongoimport -d test -c test_collection --jsonArray large-test-collection.json --host $PRIMARY --port $PRIMARY_PORT In a mongo shell on the PRIMARY run a loop to insert a bunch of test data for ( var i = 1; i <= 1000000; i++) { db.testData.insert( { x : i } ) } At the same time run a mongostat on the SECONDARY and you should see inserts flowing in at around 2k inserts/sec replicating from the PRIMARY insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time *2142 *0 *0 *0 0 2|0 0 800m 4.03g 337m 112 oplog:0.0% 0 0|0 0|0 205b 10k 3 dynatest SEC 23:11:24 *2154 *0 *0 *0 0 1|0 0 800m 4.03g 337m 115 oplog:0.0% 0 0|0 0|0 62b 10k 3 dynatest SEC 23:11:25 *2218 *0 *0 *0 0 2|0 0 800m 4.03g 338m 116 oplog:0.0% 0 0|0 0|0 205b 10k 3 dynatest SEC 23:11:26 *2166 *0 *0 *0 0 1|0 0 800m 4.03g 339m 116 oplog:0.0% 0 0|0 0|0 62b 10k 3 dynatest SEC 23:11:27 *2217 *0 *0 *0 0 2|0 0 800m 4.03g 339m 117 oplog:0.0% 0 0|0 0|1 205b 10k 3 dynatest SEC 23:11:28 *2172 *0 *0 *0 0 1|0 0 800m 4.03g 340m 113 oplog:0.0% 0 0|0 0|0 62b 10k 3 dynatest SEC 23:11:29 Kick off a 4-6 second long running query on the SECONDARY db.test_collection.find({latitude: { $gt: 0 }}).limit(200000).sort({_created_at: -1}) Notice that the mongostat shows replication has halted and the SECONDARY is no longer applying inserts from the PRIMARY until the long running query returns insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time *2196 *0 *0 *0 0 1|0 0 800m 4.13g 644m 115 oplog:0.0% 0 0|0 0|0 62b 10k 3 dynatest SEC 23:14:49 *2124 *0 *0 *0 0 2|0 0 800m 4.13g 645m 113 oplog:0.0% 0 0|0 0|0 205b 10k 3 dynatest SEC 23:14:50 *2180 *0 *0 *0 0 1|0 0 800m 4.13g 645m 114 oplog:0.0% 0 0|0 0|0 62b 10k 3 dynatest SEC 23:14:51 *1191 1 *0 *0 0 2|0 0 800m 4.13g 648m 63 oplog:0.0% 0 0|0 1|0 205b 10k 3 dynatest SEC 23:14:52 *0 *0 *0 *0 0 1|0 0 800m 4.15g 669m 0 oplog:0.0% 0 0|0 1|0 62b 10k 3 dynatest SEC 23:14:53 *0 *0 *0 *0 0 2|0 0 800m 4.17g 683m 0 oplog:0.0% 0 0|0 1|0 205b 10k 3 dynatest SEC 23:14:54 *0 *0 *0 *0 0 1|0 0 800m 4.17g 683m 0 oplog:0.0% 0 0|0 1|0 62b 10k 3 dynatest SEC 23:14:55 *0 *0 *0 *0 0 2|0 0 800m 4.17g 684m 0 oplog:0.0% 0 0|0 1|0 205b 10k 3 dynatest SEC 23:14:56 *0 *0 *0 *0 0 1|0 0 800m 4.17g 703m 0 oplog:0.0% 0 0|0 1|0 62b 10k 3 dynatest SEC 23:14:57 *10006 *0 *0 *0 0 3|0 0 800m 4.18g 717m 528 oplog:0.0% 0 0|0 1|0 413b 4m 3 dynatest SEC 23:14:58 *5278 *0 *0 *0 0 1|0 0 800m 4.18g 692m 279 oplog:0.0% 0 0|0 0|0 62b 10k 3 dynatest SEC 23:14:59 Now, you can perform the same setup with a 2.6.9 PRIMARY and 2.6.9 SECONDARY. When running the same long running query on the SECONDARY, mongostat shows inserts do slow down when the long running query is executed, but does not completely block all replication like it did running 3.0.2. insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn set repl time *1982 *0 *0 *0 0 2|0 0 800m 4.08g 223m 0 test:8.4% 0 0|0 0|0 205b 3k 3 dynatest SEC 23:21:32 *1969 1 *0 *0 0 1|0 0 800m 4.08g 224m 0 test:6.5% 0 0|0 1|0 62b 3k 3 dynatest SEC 23:21:33 *83 *0 *0 *0 0 2|0 0 800m 4.09g 244m 0 test:1.3% 0 0|0 1|0 205b 3k 3 dynatest SEC 23:21:34 *640 *0 *0 *0 0 1|0 0 800m 4.11g 269m 0 test:1.6% 0 0|0 1|0 62b 3k 3 dynatest SEC 23:21:35 *1525 *0 *0 *0 0 2|0 0 800m 4.11g 270m 0 test:3.7% 0 0|0 1|0 205b 3k 3 dynatest SEC 23:21:36 *2237 *0 *0 *0 0 1|0 0 800m 4.12g 271m 0 test:5.2% 0 0|0 1|0 62b 3k 3 dynatest SEC 23:21:37 *4406 *0 *0 *0 0 2|0 0 800m 4.12g 288m 0 test:10.3% 0 0|0 1|0 205b 3k 3 dynatest SEC 23:21:38 *4010 *0 *0 *0 0 2|0 0 800m 4.12g 297m 0 test:14.1% 0 0|0 0|0 279b 4m 3 dynatest SEC 23:21:39 *2181 *0 *0 *0 0 2|0 0 800m 4.12g 297m 0 test:8.7% 0 0|0 0|0 205b 3k 3 dynatest SEC 23:21:40 *2146 *0 *0 *0 0 1|0 0 800m 4.12g 277m 0 test:7.9% 0 0|0 0|0 62b 3k 3 dynatest SEC 23:21:41

      After upgrading a secondary node from 2.6.9 to 3.0.2, I noticed that long running queries appear to block replication and causes replication lag to increase. Downgrading the secondary back to 2.6.9 and the same long running query does not block replication.

            Assignee:
            Unassigned Unassigned
            Reporter:
            dynamike Michael Kania
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: