[SERVER-14384] Error while replicating sharded cluster environment Created: 28/Jun/14  Updated: 10/Dec/14  Resolved: 16/Jul/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Pratik Gadiya [X] Assignee: Thomas Rueckstiess
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Hi,

I wanted to replicate the production sharding cluster environment on staging systems.
So, I just copied the data from the specified dbpath of prod systems to staging systems for config svr and replica set and keeping the same configuration as like as prod, I started the services on staging environment but I am facing weird error which I am not able to resolve.

Error on configsvr :

Wed Jun 25 03:26:27.910 [conn17] update config.mongos query: { _id: "stg-xx:27022" } update: { $set: { ping: new Date(1403684776448), up: 60885, waiting: true, mongoVersion: "2.4.9" } } idhack:1 fastmod:1 keyUpdates:0 exception: BSONObj size: 859321145 (0x39333833) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO code:10334 locks(micros) w:19298 9ms
Wed Jun 25 03:26:28.847 [conn17] Assertion: 10334:BSONObj size: 859321145 (0x39333833) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO
0xde46e1 0xda5e1b 0xda635c 0x6ecacf 0x9e4680 0xacaa1f 0x81d566 0xa63869 0xa638ac 0xabcccc 0xa75f54 0xa6ec59 0xa94948 0xa97277 0x9fa7f8 0x9ffd78 0x6e8518 0xdd0cae 0x352c2079d1 0x352bee8b6d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
/usr/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xda5e1b]
/usr/bin/mongod() [0xda635c]
/usr/bin/mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x5bf) [0x6ecacf]
/usr/bin/mongod(_ZN5mongo13unindexRecordEPNS_16NamespaceDetailsEPNS_6RecordERKNS_7DiskLocEb+0x130) [0x9e4680]
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPNS_16NamespaceDetailsEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x1bf) [0xacaa1f]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails11cappedAllocEPKci+0x436) [0x81d566]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails6_allocEPKci+0x29) [0xa63869]
/usr/bin/mongod(_ZN5mongo16NamespaceDetails5allocEPKci+0x3c) [0xa638ac]
/usr/bin/mongod(_ZN5mongo11DataFileMgr17fast_oplog_insertEPNS_16NamespaceDetailsEPKci+0x1ec) [0xabcccc]
/usr/bin/mongod() [0xa75f54]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pbb+0x49) [0xa6ec59]
/usr/bin/mongod(_ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEPNS_11RemoveSaverEbRKNS_24QueryPlanSelectionPolicyEb+0x2768) [0xa94948]
/usr/bin/mongod(_ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES4_bbbRNS_7OpDebugEbRKNS_24QueryPlanSelectionPolicyE+0xb7) [0xa97277]
/usr/bin/mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x4d8) [0x9fa7f8]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xac8) [0x9ffd78]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6e8518]
/usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdd0cae]
/lib64/libpthread.so.0() [0x352c2079d1]
/lib64/libc.so.6(clone+0x6d) [0x352bee8b6d]
Wed Jun 25 03:26:28.851 [conn17] update config.lockpings query: { _id: "stg-xx:27022:1403623891:1804289383" } update: { $set: { ping: new Date(1403684777347) } } nscanned:1 keyUpdates:1 exception: BSONObj size: 859321145 (0x39333833) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO code:10334 locks(micros) w:8592 4ms

Error for Mongos :

Wed Jun 25 00:00:12.169 [LockPinger] warning: pinging failed for distributed lock pinger 'xx.xx.xx.xx:27021,xx.xx.xx.xx:27021,xx.xx.xx.xx:27021/stg-xxx:27022:1403623758:1804289383'. :: caused by :: BSONObj size: 859321145 (0x39333833) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO

Thanks



 Comments   
Comment by Thomas Rueckstiess [ 16/Jul/14 ]

Hi Pratik,

You can find instructions how to migrate a cluster in the documentation under Migrate a Sharded Cluster.

Please note that the SERVER project is for reporting bugs or feature suggestions for the MongoDB server.

For MongoDB-related support discussion please post on the mongodb-user group (http://groups.google.com/group/mongodb-user) or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-user group.

Regards,
Thomas

Comment by Pratik Gadiya [X] [ 03/Jul/14 ]

Thanks for the help.

I tried to resolve the config server by the steps mentioned above, but still facing the same issue.
I think the data is corrupted a lot.

Can you let me know all the steps which I need to perform in order to migrate data from one sharded cluster to another sharded cluster ?
I mean which database do I need to backup etc...

Thanks

Comment by Thomas Rueckstiess [ 30/Jun/14 ]

Hi Pratik,

This looks like a corruption in the data, because the document claims to be 859MB of size, which of course is impossible.

How exactly did you copy the data over? If you did a simple file copy, did you lock the database for writes (fsyncLock) before copying the data? If not, that could explain the corruption as you were copying the files while they were written to.

In order to get a consistent copy, you need to either use a filesystem snapshot, a mongodump with the --oplog option, or you need to stop writes before the copy. You can also bring down one secondary before the copy, then start it again. For the config servers, you can bring down the third config server, copy the data and start it again. Also make sure that the balancer is disabled and not running before you copy the data. That way your prod system stays operational and you will get a consistent copy.

Our documentation page on Backup of Sharded Clusters gives you a detailed explanation of the various options.

Can you let me know if one of these suggestions fixes your issue?

Thanks
Thomas

Generated at Thu Feb 08 03:34:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.