[SERVER-935] Sharding gridfs on chunks - files_id eventually crashes the server Created: 19/Nov/09  Updated: 12/Jul/16  Resolved: 13/Apr/10

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 1.5.1

Type: Bug Priority: Major - P3
Reporter: Jayson Minard Assignee: Mathias Stearn
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

using nightly build from Sept 18th, 2009

I created 3 shard servers, 1 config server and I added a blobstore database where I enabled sharding, and sharded on blobstore.fs.chunks keys on files_id (_id didn't work due to another error, see other JIRA)

Eventually at some point, it hits a file that causes:

Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:7113 #10554
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a9ff2f2265629a0a19), root: "fs" }
Wed Nov 18 23:41:29 end connection 10.0.1.51:7113
Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:7369 #10555
Wed Nov 18 23:41:29 connection accepted from 127.0.0.1:55494 #10556
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a9e035913b6dceb4b0), root: "fs" }
Wed Nov 18 23:41:29 end connection 10.0.1.51:7369
Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:7625 #10557
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a9d78b4d652dc7904f), root: "fs" }
********************
ERROR: MessagingPort::call() wrong id got:1 expect:499755046
  old:3855255207
  response msgid:3636897828
  response len:  1048660
Wed Nov 18 23:41:29   Assertion failure false util/message.cpp 365
Wed Nov 18 23:41:29 bad recv() len: -405573500
Wed Nov 18 23:41:29 Assertion: dbclient error communicating with server
Wed Nov 18 23:41:29 UserException: dbclient error communicating with server
Wed Nov 18 23:41:29 end connection 127.0.0.1:55494
0x100052ff4 0x100058b5d 0x1000567b2 0x1000a0030 0x1000aa304 0x1000bf179 0x1000c3432 0x10009b6e3 0x1000f4564 0x7fff8681bf8e 0x7fff8681be41 
 0   mongos                              0x0000000100052ff4 _ZN5mongo12sayDbContextEPKc + 260
 1   mongos                              0x0000000100058b5d _ZN5mongo8assertedEPKcS1_j + 317
 2   mongos                              0x00000001000567b2 _ZN5mongo13MessagingPort4callERNS_7MessageES2_ + 562
 3   mongos                              0x00000001000a0030 _ZN5mongo8Strategy7doQueryERNS_7RequestESs + 256
 4   mongos                              0x00000001000aa304 _ZN5mongo14SingleStrategy7queryOpERNS_7RequestE + 1220
 5   mongos                              0x00000001000bf179 _ZN5mongo7Request7processEi + 649
 6   mongos                              0x00000001000c3432 _ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortE + 258
 7   mongos                              0x000000010009b6e3 _ZN5mongo3pms9threadRunEv + 131
 8   mongos                              0x00000001000f4564 thread_proxy + 132
 9   libSystem.B.dylib                   0x00007fff8681bf8e _pthread_start + 331
 10  libSystem.B.dylib                   0x00007fff8681be41 thread_start + 13
Wed Nov 18 23:41:29 end connection 10.0.1.51:7625
Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:7881 #10558
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a9ad8ef1524d763fc4), root: "fs" }
Wed Nov 18 23:41:29 end connection 10.0.1.51:7881
Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:8137 #10559
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a96ed280225a2a3986), root: "fs" }
 
==> ./logs/db-blobstore-shard-2.log <==
Wed Nov 18 23:41:29 should have chunk: 0 have:-2147483648
Wed Nov 18 23:41:29 User Exception chunks out of order
 
==> ./logs/mongos-blobstore.log <==
Wed Nov 18 23:41:29 end connection 10.0.1.51:8137
Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:8393 #10560
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a9317d2858689a9c15), root: "fs" }
 
==> ./logs/db-blobstore-shard-2.log <==
Wed Nov 18 23:41:29 should have chunk: 0 have:-2147483648
Wed Nov 18 23:41:29 User Exception chunks out of order
 
==> ./logs/mongos-blobstore.log <==
Wed Nov 18 23:41:29 end connection 10.0.1.51:8393
Wed Nov 18 23:41:29 connection accepted from 10.0.1.51:8649 #10561
Wed Nov 18 23:41:29 passing through unknown command: filemd5 { filemd5: ObjId(4b04f6a9a524a1182abb16a2), root: "fs" }
 
==> ./logs/db-blobstore-shard-2.log <==
Wed Nov 18 23:41:29 should have chunk: 0 have:-2147483648
Wed Nov 18 23:41:29 User Exception chunks out of order
 



 Comments   
Comment by auto [ 13/Apr/10 ]

Author:

{'login': 'RedBeard0531', 'name': 'Mathias Stearn', 'email': 'mathias@10gen.com'}

Message: sharded filemd5 command SERVER-935
http://github.com/mongodb/mongo/commit/4c0481213f19b43d44707f81dac2537a80ab71d5

Comment by Mathias Stearn [ 11/Feb/10 ]
{files_id:1}

is the only supported way to shard gridfs chunks and it should be working correctly. could you try it again with the latest release and if it fails send the output of printShardingStatus(db.getSisterDB('config'))

Comment by Eliot Horowitz (Inactive) [ 20/Nov/09 ]

gridfs has a specific problem with sharding.
in particular computing the md5 of a file
not a general sharding problem

Comment by Jayson Minard [ 20/Nov/09 ]

Is this a general problem with sharding, or one specific to this case? Wondering if I should punt on going deeper or keep trying it for different uses.

Comment by Jayson Minard [ 19/Nov/09 ]

After the initial exception, nothing works any more... The "chunks out of order" error happens on every insert after.

Generated at Thu Feb 08 02:55:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.