Slow joins for ManyToMany fields

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Unknown
    • None
    • Affects Version/s: django-mongodb-backend-5.2.0b1
    • Component/s: django
    • None
    • Python Drivers
    • None
    • None
    • None
    • None
    • None
    • None

      Summary

      Joins in Many to Many fields are slow

      Motivation

      Who is the affected end user?

      Our users

      How does this affect the end user?

      The queries are slow (a few seconds)

      How likely is it that this problem or use case will occur?

      Every time the "pivot table" collection is large and has many occurences of the outer foreign key

      If the problem does occur, what are the consequences and how severe are they?

      Long running query

      Is this issue urgent?

      yes, we cannot use Many To Many fields like this

      Is this ticket required by a downstream team?

      NA

      Is this ticket only for tests?

      No

      This issue was also reported on the github of django mongo backend, with a tentative patch:

      https://github.com/mongodb/django-mongodb-backend/issues/309

      Details:

      I have an Observation model that has a ManyToMany to a Subscope model.

      The observations collection is large (100k), the subscopes collection isn't, but the same subscope appears a lot in observations, so it is often in the pivot collection created by django. Namely a collection with (_id, observation_id and subscope_id), the composite unique index on observation_id and subscope_id is automatically generated)

      When I access the many to many of an observation instance:

      observation.subscopes.all()

      A slow query is generated (this is with count() at the end but it's the same:

      [{
          '$lookup': {
              'from': 'orm_observation_subscopes',
              'let':

      Unknown macro: {             'parent__field__0'}

      ,
              'pipeline': [{
                  '$match': {
                      '$expr': {
                          '$and': [

      Unknown macro: {                         '$eq'}

      ]
                      }
                  }
              }],
              'as': 'orm_observation_subscopes'
          }
      },

      Unknown macro: {     '$unwind'}

      , {
          '$match': {
              '$expr':

      Unknown macro: {             '$eq'}

          }
      }, {
          '$facet': {
              'group': [{
                  '$group': {
                      '__count': {
                          '$sum': {
                              '$cond': {
                                  'if': {
                                      '$in': [{
                                              '$type':

      Unknown macro: {                                             '$literal'}

                                          },
                                          ['missing', 'null']
                                      ]
                                  },
                                  'then': None,
                                  'else': 1
                              }
                          }
                      },
                      '_id': None
                  }
              }]
          }
      }, {
          '$addFields': {
              '__count': {
                  '$getField': {
                      'input':

      Unknown macro: {                     '$arrayElemAt'}

      ,
                      'field': '__count'
                  }
              },
              '_id': {
                  '$getField': {
                      'input':

      Unknown macro: {                     '$arrayElemAt'}

      ,
                      'field': '_id'
                  }
              }
          }
      }, {
          '$project': {
              '__count': {
                  '$ifNull': ['$__count',

      Unknown macro: {                 '$literal'}

      ]
              }
          }
      }]

      The match contains the join field, but that's after the lookup. This query takes 3 seconds.

      If I change this query manually to add the match in the lookup like this:

      [{
          '$lookup': {
              'from': 'orm_observation_subscopes',
              'let':

      Unknown macro: {             'parent__field__0'}

      ,
              'pipeline': [{
                  '$match': {
                      '$expr': {
                          '$and': [

      Unknown macro: {                         '$eq'}

      ,

      Unknown macro: {                         '$eq'}

      ]
                      }
                  }
              }],
              'as': 'orm_observation_subscopes'
          }
      },

      Unknown macro: {     '$unwind'}

      ,

      Unknown macro: {     '$limit'}

      ]

      It takes 0.020 seconds.

      The first version of course is only slow if the pivot collection is large, it is expected. Moving the match in the lookup prevents the whole collection from being scanned.

      I added a fairly naive but working patch to the github issue, that could certainly be improved.

       

       

            Assignee:
            Emanuel Lupi
            Reporter:
            Stéphane Konstantaropoulos
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: