[SERVER-415] Increasing slave lagging Created: 12/Nov/09  Updated: 12/Jul/16  Resolved: 12/Nov/09

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 1.0.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Erwan Arzur Assignee: Eliot Horowitz (Inactive)
Resolution: Done Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 270
stepping : 2
cpu MHz : 2004.542
cache size : 1024 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips : 4011.50
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 270
stepping : 2
cpu MHz : 2004.542
cache size : 1024 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up pni lahf_lm cmp_legacy
bogomips : 4011.50
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

Linux ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com 2.6.21.7-2.fc8xen #1 SMP Fri Feb 15 12:34:28 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

  • slave(s):
    Thu Oct 29 17:12:54 Mongo DB : starting : pid = 32723 port = 27017 dbpath = /mongo/master master = 0 slave = 1 64-bit
    Thu Oct 29 17:12:54 db version v1.0.1, pdfile version 4.4
    Thu Oct 29 17:12:54 git version: e316c78bc3dcbd0729454b81eff4e172579c1bc0
    Thu Oct 29 17:12:54 sys info: Linux ofc-n1.10gen.com 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008 x86_64

2x256GB EBS volumes, stripped. 80% used.

  • master:
    Tue Sep 22 21:07:41 Mongo DB : starting : pid = 3063 port = 27017 dbpath = /mnt/mongo/master master = 1 slave = 0 64-bit
    Tue Sep 22 21:07:41 db version v1.0.0, pdfile version 4.4
    Tue Sep 22 21:07:41 git version: dabf2ce54614c6de9d728af445eec47f39dde19f
    Tue Sep 22 21:07:41 sys info: Linux ofc-n1.10gen.com 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:02:29 EDT 2008 x86_64

4x420GB (local, ephemeral) stripped, 26% used


Attachments: File mongodb.log-20091112.gz    
Participants:

 Description   

Just posted a message on the group with a complete description: http://groups.google.com/group/mongodb-user/browse_frm/thread/2fd21b8791de910c

Basically, we're witnessing increasing lag, up to 30 hours on our slaves, with no particular reason (no heavy concurrent updates, not much reading, etc ...)

Trying to upgrade to a bigger hardware platform, but data migration is a very big issue for us ...



 Comments   
Comment by Eliot Horowitz (Inactive) [ 12/Nov/09 ]

Ok.
If you have a new slave, I would try 1.1.3
You'll see greatly improved index creation time.

Comment by Erwan Arzur [ 12/Nov/09 ]

No, slave is 1.0.1. Master was 1.0.0, now updated to 1.0.1.

Comment by Eliot Horowitz (Inactive) [ 12/Nov/09 ]

so the slave is on 1.1.3 and the master on 1.0.1

Comment by Erwan Arzur [ 12/Nov/09 ]

Right, i just did the upgrade, and the slave seems to catch up, but at a slower rate than what i would expect.

Thu Nov 12 15:17:33 repl: end sync_pullOpLog syncedTo: Wed Nov 11 21:08:04 2009 4afb27b4:6
Thu Nov 12 15:18:35 pull: applied 510 operations
Thu Nov 12 15:18:35 repl: end sync_pullOpLog syncedTo: Wed Nov 11 21:10:35 2009 4afb284b:2

Comment by Eliot Horowitz (Inactive) [ 12/Nov/09 ]

You found this bug: SERVER-363
you need to upgrade the master to 1.0.1

Comment by Erwan Arzur [ 12/Nov/09 ]

Mongod output ...

Comment by Eliot Horowitz (Inactive) [ 12/Nov/09 ]

can you attach the entire log file?
also - could you try 1.1.3 on the slave?

Generated at Thu Feb 08 02:54:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.