Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 5.0.12, 5.0.9
Component/s: None
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

Unfortunately, the only way to reproduce the problem is:

Genearate a collection with 2.4B documents sharded over 10 shards

Try resharding the collection using {_id: 'hashed'}
Show
Unfortunately, the only way to reproduce the problem is: Genearate a collection with 2.4B documents sharded over 10 shards Try resharding the collection using {_id: 'hashed'}
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

When trying to reshard a collection we are getting an error message telling that the sort limit is exceeded.

The resarding command is being run as follows:

[direct: mongos] admin> sh.reshardCollection('database.messages', {_id: 'hashed'})
MongoServerError: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.

The collection contains approx. 2.4B documents split across 10 shards.

Shard distribution looks like this:

[direct: mongos] database> db.messages.getShardDistribution()
Shard rs06 at rs06/mongo-pm-repl06-1:27018,mongo-pm06:27018
{
  data: '78.1GiB',
  docs: 360057505,
  chunks: 5669,
  'estimated data per chunk': '14.1MiB',
  'estimated docs per chunk': 63513
}
---
Shard shard0001 at rs02/DS5033:27018,mongo-pm-repl02-1:27018
{
  data: '49.25GiB',
  docs: 245035457,
  chunks: 5670,
  'estimated data per chunk': '8.89MiB',
  'estimated docs per chunk': 43216
}
---
Shard shard0000 at rs01/DS5032:27018,mongo-pm-repl01-1:27018
{
  data: '43.17GiB',
  docs: 217760699,
  chunks: 5669,
  'estimated data per chunk': '7.79MiB',
  'estimated docs per chunk': 38412
}
---
Shard rs09 at rs09/mongo-pm-repl09-1:27018,mongo-pm09:27018
{
  data: '47.01GiB',
  docs: 220116142,
  chunks: 5669,
  'estimated data per chunk': '8.49MiB',
  'estimated docs per chunk': 38828
}
---
Shard rs03 at rs03/mongo-pm-repl03-1:27018,mongo-pm03:27018
{
  data: '37.81GiB',
  docs: 190704378,
  chunks: 5670,
  'estimated data per chunk': '6.82MiB',
  'estimated docs per chunk': 33633
}
---
Shard rs05 at rs05/DS5085:27018,mongo-pm-repl05-1:27018
{
  data: '36.7GiB',
  docs: 185097348,
  chunks: 5669,
  'estimated data per chunk': '6.63MiB',
  'estimated docs per chunk': 32650
}
---
Shard rs07 at rs07/mongo-pm-repl07-1:27018,mongo-pm07:27018
{
  data: '88.01GiB',
  docs: 413919369,
  chunks: 5669,
  'estimated data per chunk': '15.89MiB',
  'estimated docs per chunk': 73014
}
---
Shard rs04 at rs04/mongo-pm-repl04-1:27018,mongo-pm04:27018
{
  data: '38.33GiB',
  docs: 193215609,
  chunks: 5677,
  'estimated data per chunk': '6.91MiB',
  'estimated docs per chunk': 34034
}
---
Shard rs08 at rs08/mongo-pm-repl08-1:27018,mongo-pm08:27018
{
  data: '45.07GiB',
  docs: 207666324,
  chunks: 5668,
  'estimated data per chunk': '8.14MiB',
  'estimated docs per chunk': 36638
}
---
Shard rs10 at rs10/mongo-pm-repl10-1:27018,mongo-pm10:27018
{
  data: '37.04GiB',
  docs: 178733333,
  chunks: 2332,
  'estimated data per chunk': '16.26MiB',
  'estimated docs per chunk': 76643
}
---
Totals
{
  data: '7.81030509569951e+100GiB',
  docs: 2412306164,
  chunks: 53362,
  'Shard rs06': [
    '0 % data',
    '14.92 % docs in cluster',
    '232B avg obj size on shard'
  ],
  'Shard shard0001': [
    '0 % data',
    '10.15 % docs in cluster',
    '215B avg obj size on shard'
  ],
  'Shard shard0000': [
    '0 % data',
    '9.02 % docs in cluster',
    '212B avg obj size on shard'
  ],
  'Shard rs09': [
    '0 % data',
    '9.12 % docs in cluster',
    '229B avg obj size on shard'
  ],
  'Shard rs03': [ '0 % data', '7.9 % docs in cluster', '212B avg obj size on shard' ],
  'Shard rs05': [
    '0 % data',
    '7.67 % docs in cluster',
    '212B avg obj size on shard'
  ],
  'Shard rs07': [
    '0 % data',
    '17.15 % docs in cluster',
    '228B avg obj size on shard'
  ],
  'Shard rs04': [ '0 % data', '8 % docs in cluster', '213B avg obj size on shard' ],
  'Shard rs08': [ '0 % data', '8.6 % docs in cluster', '233B avg obj size on shard' ],
  'Shard rs10': [ '0 % data', '7.4 % docs in cluster', '222B avg obj size on shard' ]
}

The only reason I see for such behavior is that reshardCollection command does some preliminary queries/aggregation involving sorting for further work, like counting documents to copy etc., and during this phase something sort memory limit gets exceeded due to a large number of documents or chunks in the collection.

Probably those sort operations need to be done with allowDiskUse option set to tru

is related to

SERVER-68139 Resharding command fails if the projection sort is bigger than 100MB

Closed

Assignee:: Yuan Fang
Reporter:: Alexey Zarubin
Participants:: Alexey Zarubin, Yuan Fang
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Sep 27 2022 03:54:34 PM UTC
Updated:: Nov 08 2022 09:02:18 PM UTC
Resolved:: Oct 17 2022 01:40:24 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates