[SERVER-6696] Sharding an existent collection is losing data Created: 02/Aug/12  Updated: 16/Aug/12  Resolved: 16/Aug/12

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Arthur Nogueira Neves Assignee: Gregor Macadam
Resolution: Done Votes: 0
Labels: mongod, mongos, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux preprod 3.2.0-27-virtual #43-Ubuntu SMP Fri Jul 6 14:45:58 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=12.04
DISTRIB_CODENAME=precise
DISTRIB_DESCRIPTION="Ubuntu 12.04 LTS"


Attachments: PNG File Screen Shot 2012-08-02 at 4.00.39 PM.png    
Operating System: Linux
Participants:

 Description   

We had an unsharded mongodb instance, which had a collection full of documents(~100M). We created an index in that collection in order to shard it.

We enabled sharding in that node(primary), and also added another empty node in the shard grid (secondary). Everything was looking fine, but then we realize that we consistently starting losing documents from the chunk that was gettings transfered:
http://pastebin.slackadelic.com/p/8LxWaY81.html

In the attachment, i have both mongod running a count in the chunk criteria and one of them is losing documents and the other is not getting any documents.

Also I had checked the logs in both mongod servers and everthing was looking fine:
http://pastebin.slackadelic.com/p/LOsexH82.html this is from primary shard.
http://pastebin.slackadelic.com/p/ZvgQHE51.html this is from secondary shard.

So is that an issue? was this fixed in version 2.2 already?
Also there is a question in SO about it: http://stackoverflow.com/questions/11768679/am-i-losing-data-when-i-am-sharding-my-existent-collection/



 Comments   
Comment by Gregor Macadam [ 13/Aug/12 ]

I've managed to reproduce a decreasing number of documents as shown by count(). I started off with 10000000 documents and sharded the collection in the same way that you did. The number shown by count() is actually too large during the balancing (as eliot mentioned), but does show that the number of documents is decreasing back to 10000000. It is possible that this is what you are seeing and so it is important that we know the number of documents you started with - do you know this number?

mongos> db.collb.count()
10364533
mongos> db.collb.count()
10321058
mongos> db.collb.count()
10256930
mongos> db.collb.count()
10191906
mongos> db.collb.count()
10148678
mongos> db.collb.count()
10109218
mongos> db.collb.count()
10071568
mongos> db.collb.count()
10032526
mongos> db.collb.count()
10000000
mongos> db.collb.count()
10000000
mongos> 

Comment by Eliot Horowitz (Inactive) [ 03/Aug/12 ]

This is most likely a transient counting problem.
Do you know the original count before you started doing anything?
Things can get double counted during a large migrations, especially if deleting the data on the source side is slow.
Can you also send the .stats() for the collection?

Generated at Thu Feb 08 03:12:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.