Core Server
  1. Core Server
  2. SERVER-2099

MapReduce does not allow limit as one attribute on a sharded setup

    Details

    • Type: Bug Bug
    • Status: Open Open
    • Priority: Major - P3 Major - P3
    • Resolution: Unresolved
    • Affects Version/s: 1.6.3
    • Fix Version/s: planned but not scheduled
    • Labels:
      None
    • Environment:
      OS - Linux hqd-soak-03 2.6.18-164.11.1.el5 #1 SMP Wed Jan 20 07:32:21 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
    • Backport:
      No
    • Operating System:
      Linux
    • # Replies:
      3
    • Last comment by Customer:
      true

      Description

      We have sharded setup on 2 machines and have a 120GB collection on these machines.

      Ran a mapreduce job through db.runCommand with a limit attribute, to test the map-reduce functions and it did threw an error message that limit attribute is not allowed.

      Specifics

      db.runCommand({mapreduce: "bigdm",
      map:m,
      reduce: r,
      limit:100,
      query: { "dynamicRL" : { "$exists" : true}},
      out: "dt",
      verbose: true});

      It gives me following error message

      { "assertion" : "don't know mr field: limit", "assertionCode" : 10177, "errmsg" : "db assertion failure", "ok" : 0 }

        Activity

        Hide
        Eliot Horowitz
        added a comment -

        Its unclear what the semantics are of limit in this case.
        We don't want to apply it serially, as on a large limit that could be slow.
        But also don't want to apply separately as then its not accurate.

        Show
        Eliot Horowitz
        added a comment - Its unclear what the semantics are of limit in this case. We don't want to apply it serially, as on a large limit that could be slow. But also don't want to apply separately as then its not accurate.
        Hide
        Eliot Horowitz
        added a comment -

        Still not sure what the right thing to do is.

        Show
        Eliot Horowitz
        added a comment - Still not sure what the right thing to do is.
        Hide
        Andrew Armstrong
        added a comment -

        I don't use MR yet, but perhaps limit would better be named sample size?

        Since MR is not ordered, perhaps a sample size argument would mean at most X documents would be processed per shard?

        I assume its useful in MR to have a limit so you can get an approximate answer to see if a query looks right, then run the full query when you're happy?

        Show
        Andrew Armstrong
        added a comment - I don't use MR yet, but perhaps limit would better be named sample size? Since MR is not ordered, perhaps a sample size argument would mean at most X documents would be processed per shard? I assume its useful in MR to have a limit so you can get an approximate answer to see if a query looks right, then run the full query when you're happy?

          People

          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Days since reply:
              3 years, 3 days ago
              Date of 1st Reply: