Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.5.4
Component/s: Sharding
Labels:
- 26qa

Operating System:
ALL
Steps To Reproduce:

Hide

Apply load to sharded cluster with randomized crud ops.
Observe load with mongostat

Show
Apply load to sharded cluster with randomized crud ops. Observe load with mongostat
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

On a sharded cluster with two shards, three config servers and two mongos, while applying load to the cluster over both mongos nodes I see throughput dramatically decrease (to zero) every couple of minutes.

I noticed this happening at 11:25:49 on mongostat for both mongos:

insert  query update delete getmore command  vsize    res faults  netIn netOut  conn repl       time 
   779    719    763    801       0    2192  2.51g    52m      0   384k   375k   203  RTR   11:25:47 
   109    110    106    133       0     356  2.51g    52m      0    59k    59k   203  RTR   11:25:48 
     0      0      0      0       0       1  2.51g    52m      0    62b   717b   203  RTR   11:25:49 
   562    549    533    540       0    1780  2.51g    52m      0   292k   298k   203  RTR   11:25:50

insert  query update delete getmore command  vsize    res faults  netIn netOut  conn repl       time
   705    682    697    737       0    1993  2.49g    37m      0   350k   341k   205  RTR   11:25:48 
   113    116    113    107       0     333  2.49g    37m      0    57k    58k   205  RTR   11:25:49  
     0      0      0      0       0       1  2.49g    37m      0    62b   717b   205  RTR   11:25:50 
   613    597    565    548       0    1876  2.49g    37m      0   308k   322k   205  RTR   11:25:51

In the logs for the shards I see that at that time one shard decided to remove old journal files and the writebacklistener times out:

2013-12-13T11:25:47.800-0500 [journal] old journal file will be removed: /Users/tbrock/Code/QA/QA-431/cluster/s1/journal/j._35
2013-12-13T11:25:47.819-0500 [journal] old journal file will be removed: /Users/tbrock/Code/QA/QA-431/cluster/s1/journal/j._36
2013-12-13T11:25:48.356-0500 [conn552] command admin.$cmd command: { writebacklisten: ObjectId('52a8eae0f4e43082d8561211') } ntoreturn:1 keyUpdates:0  reslen:44 300098ms
2013-12-13T11:25:50.371-0500 [conn2403] insert db1.udrtest ninserted:1 keyUpdates:0 locks(micros) w:36 1997ms
2013-12-13T11:25:50.415-0500 [conn2448] remove db1.udrtest query: { num: { $lt: 83 } } ndeleted:3 keyUpdates:0 numYields:1 locks(micros) w:3995143 2041ms
2013-12-13T11:25:50.420-0500 [conn801] remove db5.whatever query: { num: { $gt: 28 } } ndeleted:1 keyUpdates:0 numYields:1 locks(micros) w:3977145 2036ms

Assignee:: Davide Italiano (Inactive)
Reporter:: Tyler Brock (Inactive)
Participants:: Davide Italiano, Scott Hernandez, Tyler Brock
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Dec 13 2013 04:31:00 PM UTC
Updated:: Jan 13 2014 06:46:18 PM UTC
Resolved:: Dec 31 2013 03:25:10 PM UTC

Details

Description

Attachments

Activity

People

Dates