[SERVER-10853] Shard collection commands hangs indefinitely Created: 23/Sep/13  Updated: 24/Sep/13  Resolved: 24/Sep/13

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Pavlo Grinchenko Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu, EC2


Operating System: ALL
Participants:

 Description   
  • We have a simple sharded deployment 2 shards SH01 and SH02

We have relatively big collection video.plays

mongos> db.video.plays.stats()
{
"sharded" : false,
"primary" : "SH01",
"ns" : "reportsRaw.video.plays",
"count" : 208594003,
"size" : 34875392112,
"avgObjSize" : 167.1926882385013,
"storageSize" : 39355395856,
"numExtents" : 39,
"nindexes" : 2,
"lastExtentSize" : 2146426864,
"paddingFactor" : 1,
"systemFlags" : 1,
"userFlags" : 0,
"totalIndexSize" : 27328222768,
"indexSizes" :

{ "_id_" : 6154320480, "cid_1_bid_1_pid_1_date_1" : 21173902288 }

,
"ok" : 1
}

Notice that I added index

{ cid: 1, bid: 1, pid: 1, date: 1 }

Now I am trying to shard this collection and issuing the following command from one of the mongos hosts in our cluster:

db.runCommand({ shardCollection: "reportsRaw.video.plays", key: { cid: 1, bid: 1, pid: 1, date: 1}})

This command gets stuck and doesn't do anything:

  • shard logs don't show anything
  • mongo shell hangs and nothing happens for really long time

When I Control-C from the shell I see the following output:

mongos> db.runCommand({ shardCollection: "reportsRaw.video.plays", key: { cid: 1, bid: 1, pid: 1, date: 1}})

^CMon Sep 23 18:37:12.003 Assertion: 13111:field not found, expected type 2
0x74e711 0x71778b 0x717ccc 0x5f43e6 0x5dccb4 0x7fee481b40b0 0x7fee48f7a085 0x73da43 0x73da59 0x741e45 0x73a60c 0x73c57b 0x73ca34 0x64d6ef 0x67de37 0x63e62a 0x658411 0x70bd17 0x6ec53c 0x2c6cda94bcf7
mongo(_ZN5mongo15printStackTraceERSo+0x21) [0x74e711]
mongo(_ZN5mongo11msgassertedEiPKc+0x9b) [0x71778b]
mongo() [0x717ccc]
mongo(_ZNK5mongo11shell_utils18ConnectionRegistry30killOperationsOnAllConnectionsEb+0x1526) [0x5f43e6]
mongo(_Z10quitNicelyi+0x24) [0x5dccb4]
/lib/x86_64-linux-gnu/libc.so.6(+0x370b0) [0x7fee481b40b0]
/lib/x86_64-linux-gnu/libpthread.so.0(recv+0x75) [0x7fee48f7a085]
mongo(_ZN5mongo6Socket5_recvEPci+0x13) [0x73da43]
mongo(_ZN5mongo6Socket11unsafe_recvEPci+0x9) [0x73da59]
mongo(_ZN5mongo6Socket4recvEPci+0x75) [0x741e45]
mongo(_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x8c) [0x73a60c]
mongo(ZN5mongo13MessagingPort4recvERKNS_7MessageERS1+0x1b) [0x73c57b]
mongo(ZN5mongo13MessagingPort4callERNS_7MessageES2+0x34) [0x73ca34]
mongo(_ZN5mongo18DBClientConnection4callERNS_7MessageES2_bPSs+0x4f) [0x64d6ef]
mongo(_ZN5mongo14DBClientCursor4initEv+0xb7) [0x67de37]
mongo(_ZN5mongo12DBClientBase5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii+0xea) [0x63e62a]
mongo(_ZN5mongo18DBClientConnection5queryERKSsNS_5QueryEiiPKNS_7BSONObjEii+0xa1) [0x658411]
mongo(_ZN5mongo9mongoFindEPNS_7V8ScopeERKN2v89ArgumentsE+0x387) [0x70bd17]
mongo(_ZN5mongo7V8Scope10v8CallbackERKN2v89ArgumentsE+0xdc) [0x6ec53c]
[0x2c6cda94bcf7]
Mon Sep 23 18:37:12.056 Error: field not found, expected type 2 at src/mongo/shell/query.js:78
^CMon Sep 23 18:37:25.357 Assertion: 13111:field not found, expected type 2
0x74e711 0x71778b 0x717ccc 0x5f43e6 0x5dccb4 0x7fee481b40b0 0x7fee48f76ca2 0x7053da 0x797d99 0x7fee48f72f8e 0x7fee48276e1d
mongo(_ZN5mongo15printStackTraceERSo+0x21) [0x74e711]
mongo(_ZN5mongo11msgassertedEiPKc+0x9b) [0x71778b]
mongo() [0x717ccc]
mongo(_ZNK5mongo11shell_utils18ConnectionRegistry30killOperationsOnAllConnectionsEb+0x1526) [0x5f43e6]
mongo(_Z10quitNicelyi+0x24) [0x5dccb4]
/lib/x86_64-linux-gnu/libc.so.6(+0x370b0) [0x7fee481b40b0]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_cond_wait+0xc2) [0x7fee48f76ca2]
mongo(_ZN5mongo15DeadlineMonitorINS_7V8ScopeEE21deadlineMonitorThreadEv+0x56a) [0x7053da]
mongo() [0x797d99]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e) [0x7fee48f72f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fee48276e1d]
Mon Sep 23 18:37:25.360 terminate() called in shell, printing stack:
0x74e711 0x5dcace 0x7fee48abef76 0x7fee48abefa3 0x7fee48abf1de 0x717831 0x717ccc 0x5f43e6 0x5dccb4 0x7fee481b40b0 0x7fee48f76ca2 0x7053da 0x797d99 0x7fee48f72f8e 0x7fee48276e1d
mongo(_ZN5mongo15printStackTraceERSo+0x21) [0x74e711]
mongo(_Z11myterminatev+0x3e) [0x5dcace]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ef76) [0x7fee48abef76]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5efa3) [0x7fee48abefa3]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5f1de) [0x7fee48abf1de]
mongo(_ZN5mongo11msgassertedEiPKc+0x141) [0x717831]
mongo() [0x717ccc]
mongo(_ZNK5mongo11shell_utils18ConnectionRegistry30killOperationsOnAllConnectionsEb+0x1526) [0x5f43e6]
mongo(_Z10quitNicelyi+0x24) [0x5dccb4]
/lib/x86_64-linux-gnu/libc.so.6(+0x370b0) [0x7fee481b40b0]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_cond_wait+0xc2) [0x7fee48f76ca2]
mongo(_ZN5mongo15DeadlineMonitorINS_7V8ScopeEE21deadlineMonitorThreadEv+0x56a) [0x7053da]
mongo() [0x797d99]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7f8e) [0x7fee48f72f8e]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fee48276e1d]



 Comments   
Comment by Pavlo Grinchenko [ 23/Sep/13 ]

OK - apparently collection started to shard. But operational experience was less than stellar.

  • we initiated sharding
  • in 1.5 hour we got message that it will compute split vectors
  • after that it locked our database for 0.5 hour and we couldn't do anything with db
  • after that it started to move chunks and db unblocked

For the live system this behavior is slightly problematic. Please close this issue.

Comment by Pavlo Grinchenko [ 23/Sep/13 ]

Any ideas why this could be happening? Other than that our cluster functions properly, but we are stuck with not being able to shard our biggest collections.

Usually when we initiate sharding we see split chunks messages in the shard server logs - now we don't see anything.

Would appreciate your assistance

Generated at Thu Feb 08 03:24:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.