[SERVER-5767] sharding (removeshard) & creating index(ensureIndex) - loop Created: 04/May/12  Updated: 08/Mar/13  Resolved: 20/Aug/12

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Sharding
Affects Version/s: 2.1.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Azat Khuzhin Assignee: Scott Hernandez (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

linux 2.6.32
mongo version: v2.1.1-pre-, pdfile version 4.5, git version: a2d6f752d56aa446220b9f14c8ad3865c2fb5db8


Operating System: ALL
Participants:

 Description   

Have two hards

First, I setup sharding, than "addShard()", than "enableShardion()", than "shardCollection()"
And not wait until all chunks are balanced, across cluster
And execute "removeshard", and not wait until all chunks migrate to main primary shard, create index on collection that sharded

And it looped in 2 state, at 88%
And migration chunks to primary also looped (at step3), at 13 chunks left (about 10 already migrated)

Collection rows: ~40 millions, avgObjSize: 680

On query execution in a new connection it write:

mongos> db.currentOp().inprog
Fri May  4 20:30:34 uncaught exception: error { "$err" : "socket exception", "code" : 11002 }

And then mongod killed

I start it again
And it now it stop indexing at 55% (in db.currentOp() no such operation, but I see it in log, each ~15 secs write "TIME [migrateThread] 23349300/42063688 55%"

And now I run "iostat -x 2" as well

$iostat -x 2
Linux 2.6.32-5-xen-amd64 (ip-10-252-43-199)     05/04/2012      _x86_64_        (1 CPU)
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           6.22    0.00    3.37   31.69    8.57   50.15
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.17    2.71    0.12    63.39     3.00    46.95     0.03   12.13    4.59  182.09   1.48   0.42
xvdb              0.16   627.80  380.62   78.33 15775.49  2846.00    81.15    17.80   38.79    1.89  218.07   0.54  25.00
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    1.36    1.36    0.00   1.36   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.28    0.00    6.76   90.99    1.97    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00   11.27    0.00   267.04     0.00    47.40     0.03    2.60    2.60    0.00   2.60   2.93
xvdb              0.00     0.00 2568.17    0.28 105478.31     1.13    82.13     3.09    1.21    1.21    0.00   0.21  53.18
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.57    0.00    8.88   86.82    3.72    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00   12.03    0.00   317.48     0.00    52.76     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2784.24    0.00 117375.36     0.00    84.31     3.27    1.17    1.17    0.00   0.19  53.07
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.29    0.00    8.86   86.86    4.00    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    6.00    0.00   168.00     0.00    56.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2752.00    0.00 116163.43     0.00    84.42     3.36    1.22    1.22    0.00   0.20  54.06
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.28    0.00    7.00   92.44    0.28    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00   40.90    0.00  1182.07     0.00    57.81     0.05    1.12    1.12    0.00   0.27   1.12
xvdb              0.00     0.00 2924.09    0.00 123207.84     0.00    84.27     3.25    1.11    1.11    0.00   0.19  54.23
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.84    0.00    8.38   90.22    0.56    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    6.98    0.00   132.96     0.00    38.08     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2916.76    0.00 122908.38     0.00    84.28     3.05    1.05    1.05    0.00   0.18  52.18
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.56    0.00    8.47   88.98    1.98    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2829.10    0.00 119282.49     0.00    84.33     3.16    1.12    1.12    0.00   0.19  53.56
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.57    0.00    7.12   89.46    2.85    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    4.56    0.00   104.84     0.00    46.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2859.54    0.00 120461.54     0.00    84.25     3.18    1.11    1.11    0.00   0.19  53.56
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.28    0.00    8.40   91.04    0.28    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    3.64    0.00    58.26     0.00    32.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2909.52    0.00 122330.53     0.00    84.09     3.00    1.03    1.03    0.00   0.18  51.99
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.57    0.00    6.82   90.62    1.99    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    6.25    0.00   154.55     0.00    49.45     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00 2886.65    0.00 121575.00     0.00    84.23     3.30    1.14    1.14    0.00   0.19  54.32
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.28    0.00   10.06   89.11    0.56    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    3.35    0.00    56.98     0.00    34.00     0.00    1.00    1.00    0.00   0.67   0.22
xvdb              0.00     0.00 2862.57    0.00 120623.46     0.00    84.28     3.08    1.08    1.08    0.00   0.18  52.51
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.56    0.00    4.78   90.45    4.21    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00   60.11    0.28  1544.94     2.25    51.24     0.01    0.15    0.15    0.00   0.06   0.34
xvdb              0.00     0.00 1843.82    0.00 77457.30     0.00    84.02     2.80    1.52    1.52    0.00   0.24  45.17
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.26    0.00    1.29   96.64    1.81    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00  140.05    0.00  3919.38     0.00    55.97     0.01    0.07    0.07    0.00   0.03   0.41
xvdb              0.00     1.55  154.78    2.58  5584.50    17.57    71.20     1.80   11.45   11.50    8.80   2.19  34.42
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.04   96.11    2.85    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00   54.92    0.00  1367.88     0.00    49.81     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00  202.07    0.00  6851.81     0.00    67.82     1.66    8.12    8.12    0.00   2.55  51.61
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.77   98.98    0.26    0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvdap1            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
xvdb              0.00     0.00  386.45    0.00 14086.96     0.00    72.91     1.89    4.93    4.93    0.00   1.32  50.95
xvdap3            0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00



 Comments   
Comment by Azat Khuzhin [ 20/Aug/12 ]

No I couldn't
Unfortunately no

I think that for now it can be closed.
If I see this again I post all information that you ask.
Thanks.

Comment by Scott Hernandez (Inactive) [ 18/Aug/12 ]

Have you been able to reproduce this and if so, can you please attach the logs?

Did you post that segfault/stack-trace in another issue?

Comment by Azat Khuzhin [ 06/May/12 ]

I can`t attach logs, because I already terminate that instances, but I attach before one log messages that writes every ~15 seconds.

About killed, I run mongod manually in one of terminal's, and it write his log messages, but then was written "Killed" in a new line
There is now errors, seg faults before killed
I don`t think that was an OOM killer because swap wasn't full

Comment by Scott Hernandez (Inactive) [ 05/May/12 ]

Can you attach the logs please? From both shards, and the mongos.

Also, what does "mongod just killed" mean? Are you talking about something like the OOM? http://www.mongodb.org/display/DOCS/The+Linux+Out+of+Memory+OOM+Killer

Comment by Azat Khuzhin [ 05/May/12 ]

And one other think: I just creating index, then kill operation of creating index - and mongod just killed

Generated at Thu Feb 08 03:09:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.