Core Server
  1. Core Server
  2. SERVER-5477

when sharded, no need to merge groups if $group _id is the shard key

    Details

    • # Replies:
      4
    • Last comment by Customer:
      true

      Description

      Copied from SERVER-4961:

      On sharded environment, using early grouping, besides the use of an index, it would be nice that we be able to avoid the mongos regrouping process.

      I'll try to explain that:

        * result_node1: [
           {
             id: "value1",
             totalcount: 50
           },
           {
             id: "value2",
             totalcount: 100
           },
         ]
       * result_node2: [
           {
             id: "value1",
             totalcount: 60
           }
         ]
      

      The real results(after mongos regroup) must looks like:

       [
           {
             id: "value1",
             totalcount: 110
           },
           {
             id: "value2",
             totalcount: 100
           },
       ]
      

      But, in some cases, mongos regrouping process is nonsense since the grouping key is same as sharding key. So, never got same group key from different shards.

      So, the prior example, now looks like:

        * result_node1: [
           {
             id: "value1",
             totalcount: 110
           }
         ]
       * result_node2: [
           {
             id: "value2",
             totalcount: 100
           }
         ]
      

      The real results must looks like:

       [
           {
             id: "value1",
             totalcount: 110
           },
           {
             id: "value2",
             totalcount: 100
           },
       ]
      

      So, the point is mongos regrouping process is a waste of time when you group using same key as sharding key.

        Activity

        Hide
        Samuel García Martínez
        added a comment - - edited

        Since in this case mongos regrouping process isn't needed, mongos shouldn't fetch entire resultset, sending limit/skip to shards (if there is no sort operation).

        Show
        Samuel García Martínez
        added a comment - - edited Since in this case mongos regrouping process isn't needed, mongos shouldn't fetch entire resultset, sending limit/skip to shards (if there is no sort operation).
        Hide
        Samuel García Martínez
        added a comment -

        I developed a fix for this issue. Is there any process or prerrequisites to do a pull request on Github with this fix?

        To give this more accuracy, $group _id can be a superset of shardkey too.

        Show
        Samuel García Martínez
        added a comment - I developed a fix for this issue. Is there any process or prerrequisites to do a pull request on Github with this fix? To give this more accuracy, $group _id can be a superset of shardkey too.
        Hide
        Ian Whalen
        added a comment -

        @samuel, the first step is to fill out the contributor agreement - http://www.10gen.com/contributor - and then open a pull request at https://github.com/mongodb/mongo/pulls

        Show
        Ian Whalen
        added a comment - @samuel, the first step is to fill out the contributor agreement - http://www.10gen.com/contributor - and then open a pull request at https://github.com/mongodb/mongo/pulls
        Hide
        Samuel García Martínez
        added a comment -

        Hi. I submitted a pull request for this issue. I hope it helps.

        https://github.com/mongodb/mongo/pull/294

        Show
        Samuel García Martínez
        added a comment - Hi. I submitted a pull request for this issue. I hope it helps. https://github.com/mongodb/mongo/pull/294

          People

          • Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Days since reply:
              2 years, 27 weeks, 1 day ago
              Date of 1st Reply: