Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70012

Getting 'Sort exceeded memory limit' error when trying to reshard a collection

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.0.12, 5.0.9
    • Component/s: None
    • None
    • ALL
    • Hide

      Unfortunately, the only way to reproduce the problem is:

      1. Genearate a collection with 2.4B documents sharded over 10 shards
      2. Try resharding the collection using {_id: 'hashed'}
      Show
      Unfortunately, the only way to reproduce the problem is: Genearate a collection with 2.4B documents sharded over 10 shards Try resharding the collection using {_id: 'hashed'}

      When trying to reshard a collection we are getting an error message telling that the sort limit is exceeded.

      The resarding command is being run as follows:

      [direct: mongos] admin> sh.reshardCollection('database.messages', {_id: 'hashed'})
      MongoServerError: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.
      

      The collection contains approx. 2.4B documents split across 10 shards.

      Shard distribution looks like this:

      [direct: mongos] database> db.messages.getShardDistribution()
      Shard rs06 at rs06/mongo-pm-repl06-1:27018,mongo-pm06:27018
      {
        data: '78.1GiB',
        docs: 360057505,
        chunks: 5669,
        'estimated data per chunk': '14.1MiB',
        'estimated docs per chunk': 63513
      }
      ---
      Shard shard0001 at rs02/DS5033:27018,mongo-pm-repl02-1:27018
      {
        data: '49.25GiB',
        docs: 245035457,
        chunks: 5670,
        'estimated data per chunk': '8.89MiB',
        'estimated docs per chunk': 43216
      }
      ---
      Shard shard0000 at rs01/DS5032:27018,mongo-pm-repl01-1:27018
      {
        data: '43.17GiB',
        docs: 217760699,
        chunks: 5669,
        'estimated data per chunk': '7.79MiB',
        'estimated docs per chunk': 38412
      }
      ---
      Shard rs09 at rs09/mongo-pm-repl09-1:27018,mongo-pm09:27018
      {
        data: '47.01GiB',
        docs: 220116142,
        chunks: 5669,
        'estimated data per chunk': '8.49MiB',
        'estimated docs per chunk': 38828
      }
      ---
      Shard rs03 at rs03/mongo-pm-repl03-1:27018,mongo-pm03:27018
      {
        data: '37.81GiB',
        docs: 190704378,
        chunks: 5670,
        'estimated data per chunk': '6.82MiB',
        'estimated docs per chunk': 33633
      }
      ---
      Shard rs05 at rs05/DS5085:27018,mongo-pm-repl05-1:27018
      {
        data: '36.7GiB',
        docs: 185097348,
        chunks: 5669,
        'estimated data per chunk': '6.63MiB',
        'estimated docs per chunk': 32650
      }
      ---
      Shard rs07 at rs07/mongo-pm-repl07-1:27018,mongo-pm07:27018
      {
        data: '88.01GiB',
        docs: 413919369,
        chunks: 5669,
        'estimated data per chunk': '15.89MiB',
        'estimated docs per chunk': 73014
      }
      ---
      Shard rs04 at rs04/mongo-pm-repl04-1:27018,mongo-pm04:27018
      {
        data: '38.33GiB',
        docs: 193215609,
        chunks: 5677,
        'estimated data per chunk': '6.91MiB',
        'estimated docs per chunk': 34034
      }
      ---
      Shard rs08 at rs08/mongo-pm-repl08-1:27018,mongo-pm08:27018
      {
        data: '45.07GiB',
        docs: 207666324,
        chunks: 5668,
        'estimated data per chunk': '8.14MiB',
        'estimated docs per chunk': 36638
      }
      ---
      Shard rs10 at rs10/mongo-pm-repl10-1:27018,mongo-pm10:27018
      {
        data: '37.04GiB',
        docs: 178733333,
        chunks: 2332,
        'estimated data per chunk': '16.26MiB',
        'estimated docs per chunk': 76643
      }
      ---
      Totals
      {
        data: '7.81030509569951e+100GiB',
        docs: 2412306164,
        chunks: 53362,
        'Shard rs06': [
          '0 % data',
          '14.92 % docs in cluster',
          '232B avg obj size on shard'
        ],
        'Shard shard0001': [
          '0 % data',
          '10.15 % docs in cluster',
          '215B avg obj size on shard'
        ],
        'Shard shard0000': [
          '0 % data',
          '9.02 % docs in cluster',
          '212B avg obj size on shard'
        ],
        'Shard rs09': [
          '0 % data',
          '9.12 % docs in cluster',
          '229B avg obj size on shard'
        ],
        'Shard rs03': [ '0 % data', '7.9 % docs in cluster', '212B avg obj size on shard' ],
        'Shard rs05': [
          '0 % data',
          '7.67 % docs in cluster',
          '212B avg obj size on shard'
        ],
        'Shard rs07': [
          '0 % data',
          '17.15 % docs in cluster',
          '228B avg obj size on shard'
        ],
        'Shard rs04': [ '0 % data', '8 % docs in cluster', '213B avg obj size on shard' ],
        'Shard rs08': [ '0 % data', '8.6 % docs in cluster', '233B avg obj size on shard' ],
        'Shard rs10': [ '0 % data', '7.4 % docs in cluster', '222B avg obj size on shard' ]
      }
      

      The only reason I see for such behavior is that reshardCollection command does some preliminary queries/aggregation involving sorting for further work, like counting documents to copy etc., and during this phase something sort memory limit gets exceeded due to a large number of documents or chunks in the collection.

      Probably those sort operations need to be done with allowDiskUse option set to tru

            Assignee:
            yuan.fang@mongodb.com Yuan Fang
            Reporter:
            alexey.zarubin@gmail.com Alexey Zarubin
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: