Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31995

Logged initial sync statistics may exceed 16mb causing fassert

    • Fully Compatible
    • ALL
    • v3.6, v3.4
    • 11

      Hi Team,

      We have a 10 shards (Primary / Secondary / Arbiter) sharded cluster which hosts 70k databases.

      Here's the repartition on the shards:
      (NB: some of our databases are not sharded)

      mongos> db.databases.aggregate({$group:{_id: '$primary', count: {$sum:1}}})
      { "_id" : "clust-users-2-shard10", "count" : 4594 }
      { "_id" : "clust-users-2-shard9", "count" : 8945 }
      { "_id" : "clust-users-2-shard8", "count" : 8624 }
      { "_id" : "clust-users-2-shard1", "count" : 8084 }
      { "_id" : "clust-users-2-shard7", "count" : 4505 }
      { "_id" : "clust-users-2-shard2", "count" : 4769 }
      { "_id" : "clust-users-2-shard6", "count" : 9370 }
      { "_id" : "clust-users-2-shard4", "count" : 4717 }
      { "_id" : "clust-users-2-shard3", "count" : 10217 }
      { "_id" : "clust-users-2-shard5", "count" : 5953 }
      

      We're currently experiencing issues to resync this shard from scratch with the following error:

      2017-11-16T05:49:33.245+0100 I -        [replication-115] Assertion: 10334:BSONObj size: 32985739 (0x1F7528B) is invalid. Size must be between 0 and 16793600(16MB) First element: databasesCloned: 10191 src/mongo/bson/bsonobj.cpp 58
      

      On another cluster with the same architecture but less databases per shards, we do not encounter this issue.

      We plan to upgrade from version 3.4.4 to 3.4.10 but we haven't found anything related to this issue in changelog.
      Is this a known issue or do you have more information about this?

      Thanks.

      Regards,
      Benoit

        1. clust-users-2-shard3-2.log
          15 kB
        2. mongod-logs.txt
          9 kB

            Assignee:
            benety.goh@mongodb.com Benety Goh
            Reporter:
            benoit@sendinblue.com Benoit Bui
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: