Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: 2.8.0-rc4
Component/s: Replication
Labels:
- 28qa

Backwards Compatibility:
Minor Change
Operating System:
ALL
Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

moveChunk when the collection is replicated is slow when comparing to 2.6.6, possible due to tail local.oplog.rs

Test

sharded cluster with replication
time moveChunk for a chunk with around 16913 doc
_waitForDelete: true

moveChunk test results

it takes around 80-90 second to move one chunk for 2.8 rc4, and it takes around 4 second for 2.6.6. The results are similar for either mmapv1 or wired tiger

raw data (the timestamp is start and end migration ts)

rc4:  mmapv1

    { "_id" : 3 } -->> { "_id" : 16915 } on : rs1 Timestamp(14, 4)    <-- chunk details

    Wed Jan 14 2015 00:27:16 GMT+0000 (UTC),Wed Jan 14 2015 00:28:40 GMT+0000 (UTC)  <-- two timestamps, first is for start of moveChunk, the second is when the operation stop. 
    Wed Jan 14 2015 00:28:40 GMT+0000 (UTC),Wed Jan 14 2015 00:30:09 GMT+0000 (UTC)


2.6.6
    { "_id" : 1141 } -->> { "_id" : 18054 } on : rs1 Timestamp(8, 0)

    Wed Jan 14 2015 00:56:49 GMT+0000 (UTC),Wed Jan 14 2015 00:56:53 GMT+0000 (UTC)
    Wed Jan 14 2015 00:56:53 GMT+0000 (UTC),Wed Jan 14 2015 00:56:57 GMT+0000 (UTC)
    Wed Jan 14 2015 00:56:57 GMT+0000 (UTC),Wed Jan 14 2015 00:57:01 GMT+0000 (UTC)

without _waitForDelete, time is around half, still slow

# tested with rc5 
Wed Jan 14 2015 15:13:12 GMT+0000 (UTC),Wed Jan 14 2015 15:13:56 GMT+0000 (UTC)
Wed Jan 14 2015 15:13:56 GMT+0000 (UTC),Wed Jan 14 2015 15:14:40 GMT+0000 (UTC)
Wed Jan 14 2015 15:14:40 GMT+0000 (UTC),Wed Jan 14 2015 15:15:24 GMT+0000 (UTC)

getMore for local.oplog.rs

when debug, found that RC4 have far more getMore commands than that of 2.6.6, after check log, found all of the getMore local.oplog.rs are nreturned:1, which it is not the case for 2.6.6

for example

# rc4
2015-01-14T14:17:12.529+0000 I QUERY    [conn10] getmore local.oplog.rs cursorid:23091492740 ntoreturn:0 keyUpdates:0  nreturned:1 reslen:123 0ms

# 2.6.6
2015-01-14T14:29:00.329+0000 [conn7] getmore local.oplog.rs cursorid:18596876821 ntoreturn:0 keyUpdates:0 numYields:1 locks(micros) r:89 nreturned:108 reslen:11144 0ms

count of getMore (with logLevel=1)

# rc4  -- total getMore
[SHARDS-2 wt-move-chunk-1]$ cat mongod-wt.log | grep getmore | wc
  16926  203112 2471218

# rc4 -- total getMore with nreturned != 1, and they are all nreturned:0
[SHARDS-2 wt-move-chunk-1]$ cat mongod-wt.log | grep getmore | grep -v "nreturned:1 " | wc
     11     132    1628

[SHARDS-2 wt-move-chunk-1]$ cat mongod-wt.log | grep getmore | grep -v "nreturned:1 "
2015-01-14T14:15:45.668+0000 I QUERY    [conn10] getmore local.oplog.rs cursorid:23091492740 ntoreturn:0 keyUpdates:0  nreturned:0 reslen:20 5011ms
...

# 2.6.6
[SHARDS-2 wt-move-chunk-1]$ cat mongod-266.log | grep getmore | wc
    179    2327   29928
[SHARDS-2 wt-move-chunk-1]$ cat mongod-266.log | grep getmore | grep -v "nreturned:1 " | wc
    173    2249   28941

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

moveChunkPerf.tar.gz
370 kB
Jan 21 2015 06:14:13 PM UTC

related to

SERVER-14041 enhance secondaryThrottle parameter

Closed

Assignee:: Spencer Brody (Inactive)

Reporter:: Rui Zhang (Inactive)

Participants:: Asya Kamsky, Rui Zhang, Spencer Brody

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Created:: Jan 14 2015 03:00:28 PM UTC

Updated:: Jul 09 2016 09:09:16 PM UTC

Resolved:: Jan 21 2015 10:47:33 PM UTC

GA Target Date:: None

Public Preview Target Date:: None

Private Preview Target Date:: None

Experiment Target Date:: None

Details

Description

Test

moveChunk test results

getMore for local.oplog.rs

Attachments

Attachments

Issue Links

Activity

People

Dates