|
Ok! Thank you for your help.
|
|
I'm going to close this out - if you would like to open a bug for it then feel free.
Thanks
Gregor
|
|
Hi
I spoke to Kristina who owns replication and what is happening when you sync the new node is that the documents are being synced over the network to the new node, and then the new node tails the oplog - but it waits until it is slaveDelay behind before applying any operations. So this is why after you have synced the node, there is no activity on that node and then suddenly when you are at slaveDelay you are seeing saturated disk - because operations have now begun to be replicated onto the secondary.
So in terms of opening this as a bug - it is a case of this node not being able to replicate easily because of low resident memory - probably influenced by the low RAM available and nagios. I think that if you weren't using slaveDelay, then you would see this issue immediately and not after slaveDelay time. So it's not necessarily connected to slaveDelay I don't think.
If you would like to open this as a bug - as a sort of regression from 2.0.x to 2.2.x where replication is less efficient in low RAM situations then I think that would be fine - we can see what the server team have to say about it at least.
You can switch back to 2.0.8.
Thanks
Gregor
|
|
Hi Gregor,
we've just achieved the slaveDelay limit in senni. As you can see in the
MMS graphs, disk IO and system load has increased a lot, just as expected.
Do you want me to do any other test? If not, I'll downgrade to 2.0.8
because we need this server to be ok.
Waiting for your answer.
Regards,
|
|
Not now, because it is still under the delaySlave. The problem will come in
5 days, just when the replica lag hits the slaveDelay value.
|
|
iowait looks ok on MMS and the last iostat you posted didn't show saturated disk - are you still seeing a performance problem your end?
|
|
No, it does have the slave delay value configured:
partitionB:SECONDARY> rs.conf()
{
"_id" : "partitionB",
"version" : 32,
"members" : [
{
"_id" : 0,
"host" : "flash:27017"
},
{
"_id" : 3,
"host" : "arden:27017"
}
,
{
"_id" : 4,
"host" : "senni:27017",
"priority" : 0,
* "slaveDelay" : 345600,*
"buildIndexes" : false
}
]
}
|
|
Are you running senni without slaveDelay here?
|
|
Done! Now, we have the last version and you got iostat graphs in MMS
Btw, the server is now in state "SECONDARY" since a few minutes ago:
Thu Jan 24 13:38:01 [rsSync] replSet SECONDARY
Thu Jan 24 13:38:01 [rsSync] replSet slavedelay sleep long time: 345600
A random part of iostat:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 2,00 0,00 2,00 0,00 32,00 16,00
|
0,02 8,50 8,50 1,70
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 36,00 0,00 2,00 0,00 304,00 152,00
|
0,01 3,00 1,50 0,30
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 7,00 0,00 5,00 0,00 96,00 19,20
|
0,03 6,20 6,20 3,10
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 2,00 0,00 8,00 0,00 80,00 10,00
|
0,01 0,62 0,25 0,20
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 7,00 0,00 4,00 0,00 88,00 22,00
|
0,03 7,00 6,50 2,60
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00
|
0,00 0,00 0,00 0,00
|
I hope this helps.
Regards,
|
|
Could you upgrade the mms agent on senni also? It's version 1.3.7 and latest is 1.5.3 - I think this is the reason iostat data is not showing in MMS.
|
|
OK thanks 
|
|
I moved the dbpath to partitionB.old and started mongo, so it's doing the
initial sync ("STARTUP2" state). Now, the iostat is as follows (1 second
sample)
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 5343,00 0,00 135,00 0,00 132784,00 983,59
|
134,87 638,53 7,41 100,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 109,00 0,00 101936,00 935,19
|
34,48 1062,01 7,21 78,60
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 152,00 0,00 215,00 0,00 14040,00 65,30
|
5,73 26,64 1,37 29,50
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 12439,00 0,00 103,00 0,00 91920,00 892,43
|
40,06 271,17 7,01 72,20
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 37,00 0,00 32936,00 890,16
|
2,11 384,86 7,89 29,20
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 24,00 0,00 16448,00 685,33
|
0,22 9,25 6,38 15,30
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 14449,00 0,00 126,00 0,00 115432,00 916,13
|
49,80 306,00 7,02 88,50
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 9,00 0,00 42,00 0,00 29544,00 703,43
|
0,97 290,86 6,86 28,80
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 13361,00 0,00 49,00 0,00 39608,00 808,33
|
15,73 42,18 6,33 31,00
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 102,00 0,00 95568,00 936,94
|
27,84 406,81 6,98 71,20
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 23,00 0,00 16352,00 710,96
|
0,19 7,74 6,30 14,50
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0,00 0,00 0,00 21,00 0,00 13136,00 625,52
|
0,21 10,52 8,48 17,80
|
|
When it finishes the initial sync and becomes "SECONDARY" I'll send you the
iostat again
|
|
Do you have iostat data from before the secondary hits the slaveDelay that you could share?
|
|
Hi,
I tried it but with no luck...
/extra/mongo/bin/mongod --dbpath /extra/mongo/partitionB --port 27017
--logpath /extra/mongo/logs/partitionB.log --fork --rest --pidfilepath
/extra/mongo/var/pid_partitionB --oplogSize 102400 --shardsvr --replSet
partitionB --replIndexPrefetch none --journal
iostat /dev/sdb 1 1000 -x
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
0.74 0.00 0.99 79.65 0.00 18.61
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0.00 0.00 158.00 0.00 25960.00 0.00 164.30
|
18.75 120.66 6.33 100.00
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
0.25 0.00 0.50 85.46 0.00 13.78
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0.00 0.00 140.00 0.00 27296.00 0.00 194.97
|
10.95 80.34 7.14 100.00
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
4.46 0.00 1.49 91.58 0.00 2.48
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0.00 0.00 140.00 0.00 27096.00 0.00 193.54
|
12.44 88.44 7.14 100.00
|
|
avg-cpu: %user %nice %system %iowait %steal %idle
|
7.71 0.00 2.99 80.85 0.00 8.46
|
|
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
|
avgqu-sz await svctm %util
|
sdb 0.00 0.00 145.00 0.00 26296.00 0.00 181.35
|
15.75 101.34 6.90 100.10
|
|
|
Going back to 2.0.8 solves the problem...
|
|
|
Hi
Can you try to switch replIndexPrefetch to "none" on senni and see if this helps please?
Thanks
Gregor
|
|
I would say downgrade that node to 2.0.8 - don't go back to 2.0.0.
It will be fine to run like that though - that node can never become primary anyway.
|
|
I'll try to add more RAM to see what happens, but at the moment, I'll
downgrade both nodes. It won't be any problem with the data? I mean, can a
2.2.0 server get data from a 2.2.2 replica set? The fact that this node is
also part of the shard can affect?
Thank you for your help.
|
|
You can't turn it off. I'm guessing that that is the issue. If you can repro it on another host (with more RAM) then we can investigate the possiblity that this is something specific with slaveDelay.
Yes you could downgrade that node to 2.0.x if that solves the problem.
|
|
OK. And can I do something to avoid the current problem? I mean, is there a
way for "turrning off" this parallel replication ops or something like
that? Could it be possible to have just the delayed servers in an old
version? I mean, just downgrade those 2 servers to 2.2.0 and let the rest
of the cluster in 2.2.2?
|
|
When you are doing the intial sync, you are querying the data files on the primary and writing consecutive blocks of disk on the secondary. You are page faulting but writing and entire page. When you reach slaveDelay, you begin to tail the oplog, you are page faulting but only updating part of the page (one document), page faulting somewhere else and updating another single document. So your page faulting is a lot less efficient when you are tailing the oplog than doing the intial sync. This sounds like it is saturating your disk trying to keep up with replication. There are some changes to replication between 2.0.x and 2.2.x - basically parallel replication operations on the secondary which I assume would explain the difference.
|
|
... and... why it performs ok when the slaveDelay is not hit?
|
|
Yes. The other host has the same amount of RAM, although is less used (no
nagios). But, in fact, I would expect the same behaviour than with version
2.2.0... and why the 100% in disk usage?
|
|
OK that explains the low resident - running the OS , nagios and mongodb on 2GB is not going to give you decent performance.
Which other host are you seeing this problem with - it it also low RAM?
|
|
Mongo datafiles are on sdb.
The server has 2 Gb of mem installed and 1 Gb of mem free.
[mongo@OBE015 (senni) ~]$ free -mt
total used free shared buffers cached
Mem: 2005 1353 651 0 8 444
-/+ buffers/cache: 901 1103
Swap: 3999 725 3274
Total: 6005 2078 3926
|
|
It doesn't look that bad - however assuming the data files are stored on sda then you could lower the readahead
sudo blockdev --setra 64 /dev/sda
|
and restart mongod.
However since those settings don't look terrible it could be that nagios is using a lot of RAM - how much RAM is on that box?
|
|
The blockdev report:
The blockdev
[root@OBE015 (senni) ~]# blockdev --report
|
RO RA SSZ BSZ StartSec Size Device
|
rw 256 512 4096 0 1000204886016 /dev/sda
|
rw 256 512 1024 2048 209715200 /dev/sda1
|
rw 256 512 4096 411648 4194304000 /dev/sda2
|
rw 256 512 4096 8603648 2097152000 /dev/sda3
|
rw 256 512 1024 12699648 1024 /dev/sda4
|
rw 256 512 4096 12701696 993700872192 /dev/sda5
|
rw 256 512 4096 0 1000204886016 /dev/sdb
|
rw 256 512 4096 63 1000202241024 /dev/sdb1
|
|
|
Resident memory on senni hasn't changed in the last several months so I don't think it is connected but let's get it fixed anyway.
can you do a
on that host?
|
There is nothing strange with this lag, because senni is configured with a 5 days delay (just to have a backup to prevent accidental deletions)
The other servers can connect to senni with no problem. In fact, in senni's logs I can see the connections:
Wed Jan 23 11:22:16 [initandlisten] connection accepted from X.X.X.144:54034 #4621 (4 connections now open)
Wed Jan 23 11:22:46 [conn4620] end connection X.X.X.147:53738 (3 connections now open)
Wed Jan 23 11:22:46 [initandlisten] connection accepted from X.X.X.147:53742 #4622 (4 connections now open)
Wed Jan 23 11:22:46 [conn4621] end connection X.X.X.144:54034 (3 connections now open)
The connection errors that you see appear because I stopped the mongod processes two hours ago because load was too high and disk IO was at 100%. After stopping the mongod process, all began to work fine. Now, disk IO and load are about 0.
By the way, when I stopped the processes I got this error in the logs in BOTH delayed servers:
Wed Jan 23 11:26:01 [DataFileSync] flushing mmaps took 25678ms for 114 files
Wed Jan 23 11:26:03 got signal 15 (Terminated), will terminate after current cmd ends
Wed Jan 23 11:26:04 [repl writer worker 1] ERROR: writer worker caught exception: interrupted at shutdown on: { ts: Timestamp 1358590987000|27, h: -8072858002892174337, v: 2, op: "d", ns: "xxxx.yy", b: true, o:
Unknown macro: { _id}
}
Wed Jan 23 11:26:04 [repl writer worker 1] Fatal Assertion 16360
0xaffd31 0xac5323 0x9a2e56 0xad35cd 0xb45ba9 0x3e1e4077f1 0x3e1dce5ccd
/extra/mongo/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaffd31]
/extra/mongo/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xac5323]
/extra/mongo/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x156) [0x9a2e56]
/extra/mongo/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0xad35cd]
/extra/mongo/bin/mongod() [0xb45ba9]
/lib64/libpthread.so.0() [0x3e1e4077f1]
/lib64/libc.so.6(clone+0x6d) [0x3e1dce5ccd]
Wed Jan 23 11:26:04 [repl writer worker 1]
***aborting after fassert() failure
Wed Jan 23 11:26:04 Got signal: 6 (Aborted).
Wed Jan 23 11:26:04 Backtrace:
0xaffd31 0x558bb9 0x3e1dc32900 0x3e1dc32885 0x3e1dc34065 0xac535e 0x9a2e56 0xad35cd 0xb45ba9 0x3e1e4077f1 0x3e1dce5ccd
/extra/mongo/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xaffd31]
/extra/mongo/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x558bb9]
/lib64/libc.so.6() [0x3e1dc32900]
/lib64/libc.so.6(gsignal+0x35) [0x3e1dc32885]
/lib64/libc.so.6(abort+0x175) [0x3e1dc34065]
/extra/mongo/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xac535e]
/extra/mongo/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x156) [0x9a2e56]
/extra/mongo/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x26d) [0xad35cd]
/extra/mongo/bin/mongod() [0xb45ba9]
/lib64/libpthread.so.0() [0x3e1e4077f1]
/lib64/libc.so.6(clone+0x6d) [0x3e1dce5ccd]
I replaced the ns name to "xxxx.yy"
Maybe this error is related to the problem we are having?
What is more, now, when starting the mongod server in stays in state "STARTUP2". In the logs, we got:
Wed Jan 23 13:22:17 [conn74] assertion 13436 not master or secondary; cannot currently read from this replSet member ns:config.settings query:{}
Wed Jan 23 13:22:17 [conn74] problem detected during query over config.settings :
Unknown macro: { $err}
The rs.config() from flash (the primary) is as follows:
PRIMARY> rs.config()
{
"_id" : "partitionB",
"version" : 32,
"members" : [
Unknown macro: { "_id" }
,
Unknown macro: { "_id" }
,
Unknown macro: { "_id" }
]
}
I hope we can find a solution, because now I have both servers down. The only solution at this point is performing a full resync (delete the dbpath and start service again), but I'll wait your answer before doing it just in case you need more details.
|
Generated at Thu Feb 08 03:17:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.