Core Server
  1. Core Server
  2. SERVER-6696

Sharding an existent collection is losing data

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major - P3 Major - P3
    • Resolution: Works as Designed
    • Affects Version/s: 2.0.6
    • Fix Version/s: None
    • Component/s: Sharding
    • Environment:
      Linux preprod 3.2.0-27-virtual #43-Ubuntu SMP Fri Jul 6 14:45:58 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
      DISTRIB_ID=Ubuntu
      DISTRIB_RELEASE=12.04
      DISTRIB_CODENAME=precise
      DISTRIB_DESCRIPTION="Ubuntu 12.04 LTS"
    • Backport:
      No
    • Operating System:
      Linux
    • Bug Type:
      Unknown
    • # Replies:
      2
    • Last comment by Customer:
      false

      Description

      We had an unsharded mongodb instance, which had a collection full of documents(~100M). We created an index in that collection in order to shard it.

      We enabled sharding in that node(primary), and also added another empty node in the shard grid (secondary). Everything was looking fine, but then we realize that we consistently starting losing documents from the chunk that was gettings transfered:
      http://pastebin.slackadelic.com/p/8LxWaY81.html

      In the attachment, i have both mongod running a count in the chunk criteria and one of them is losing documents and the other is not getting any documents.

      Also I had checked the logs in both mongod servers and everthing was looking fine:
      http://pastebin.slackadelic.com/p/LOsexH82.html this is from primary shard.
      http://pastebin.slackadelic.com/p/ZvgQHE51.html this is from secondary shard.

      So is that an issue? was this fixed in version 2.2 already?
      Also there is a question in SO about it: http://stackoverflow.com/questions/11768679/am-i-losing-data-when-i-am-sharding-my-existent-collection/

        Activity

        Hide
        Eliot Horowitz
        added a comment -

        This is most likely a transient counting problem.
        Do you know the original count before you started doing anything?
        Things can get double counted during a large migrations, especially if deleting the data on the source side is slow.
        Can you also send the .stats() for the collection?

        Show
        Eliot Horowitz
        added a comment - This is most likely a transient counting problem. Do you know the original count before you started doing anything? Things can get double counted during a large migrations, especially if deleting the data on the source side is slow. Can you also send the .stats() for the collection?
        Hide
        Gregor Macadam
        added a comment -

        I've managed to reproduce a decreasing number of documents as shown by count(). I started off with 10000000 documents and sharded the collection in the same way that you did. The number shown by count() is actually too large during the balancing (as eliot mentioned), but does show that the number of documents is decreasing back to 10000000. It is possible that this is what you are seeing and so it is important that we know the number of documents you started with - do you know this number?

        mongos> db.collb.count()
        10364533
        mongos> db.collb.count()
        10321058
        mongos> db.collb.count()
        10256930
        mongos> db.collb.count()
        10191906
        mongos> db.collb.count()
        10148678
        mongos> db.collb.count()
        10109218
        mongos> db.collb.count()
        10071568
        mongos> db.collb.count()
        10032526
        mongos> db.collb.count()
        10000000
        mongos> db.collb.count()
        10000000
        mongos> 
        
        Show
        Gregor Macadam
        added a comment - I've managed to reproduce a decreasing number of documents as shown by count(). I started off with 10000000 documents and sharded the collection in the same way that you did. The number shown by count() is actually too large during the balancing (as eliot mentioned), but does show that the number of documents is decreasing back to 10000000. It is possible that this is what you are seeing and so it is important that we know the number of documents you started with - do you know this number? mongos> db.collb.count() 10364533 mongos> db.collb.count() 10321058 mongos> db.collb.count() 10256930 mongos> db.collb.count() 10191906 mongos> db.collb.count() 10148678 mongos> db.collb.count() 10109218 mongos> db.collb.count() 10071568 mongos> db.collb.count() 10032526 mongos> db.collb.count() 10000000 mongos> db.collb.count() 10000000 mongos>

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              1 year, 35 weeks, 2 days ago
              Date of 1st Reply: