On a sharded cluster with two shards, three config servers and two mongos, while applying load to the cluster over both mongos nodes I see throughput dramatically decrease (to zero) every couple of minutes.
I noticed this happening at 11:25:49 on mongostat for both mongos:
insert query update delete getmore command vsize res faults netIn netOut conn repl time 779 719 763 801 0 2192 2.51g 52m 0 384k 375k 203 RTR 11:25:47 109 110 106 133 0 356 2.51g 52m 0 59k 59k 203 RTR 11:25:48 0 0 0 0 0 1 2.51g 52m 0 62b 717b 203 RTR 11:25:49 562 549 533 540 0 1780 2.51g 52m 0 292k 298k 203 RTR 11:25:50
insert query update delete getmore command vsize res faults netIn netOut conn repl time 705 682 697 737 0 1993 2.49g 37m 0 350k 341k 205 RTR 11:25:48 113 116 113 107 0 333 2.49g 37m 0 57k 58k 205 RTR 11:25:49 0 0 0 0 0 1 2.49g 37m 0 62b 717b 205 RTR 11:25:50 613 597 565 548 0 1876 2.49g 37m 0 308k 322k 205 RTR 11:25:51
In the logs for the shards I see that at that time one shard decided to remove old journal files and the writebacklistener times out:
2013-12-13T11:25:47.800-0500 [journal] old journal file will be removed: /Users/tbrock/Code/QA/QA-431/cluster/s1/journal/j._35
2013-12-13T11:25:47.819-0500 [journal] old journal file will be removed: /Users/tbrock/Code/QA/QA-431/cluster/s1/journal/j._36
2013-12-13T11:25:48.356-0500 [conn552] command admin.$cmd command: { writebacklisten: ObjectId('52a8eae0f4e43082d8561211') } ntoreturn:1 keyUpdates:0 reslen:44 300098ms
2013-12-13T11:25:50.371-0500 [conn2403] insert db1.udrtest ninserted:1 keyUpdates:0 locks(micros) w:36 1997ms
2013-12-13T11:25:50.415-0500 [conn2448] remove db1.udrtest query: { num: { $lt: 83 } } ndeleted:3 keyUpdates:0 numYields:1 locks(micros) w:3995143 2041ms
2013-12-13T11:25:50.420-0500 [conn801] remove db5.whatever query: { num: { $gt: 28 } } ndeleted:1 keyUpdates:0 numYields:1 locks(micros) w:3977145 2036ms