Support global secondary indexes in sharded clusters

XMLWordPrintableJSON

    • Type: New Feature
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Index Maintenance, Sharding
    • None
    • Cluster Scalability
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      If you have a collection containing "foo" and "bar", it could be the case that you need to make queries against both "foo" and "bar". You could configure a sharded collection to range shard over "foo", which would make any queries against "foo" only hit 1 replset shard, but any query against "bar" would require fanning out to every replset, which can adversely impact load/latency/availability if you have many replset shards.

      An approach to solving this would be to have global secondary indexes (GSI; c.f. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html), where you store the keys you want to index ("foo" and "bar" here), along with the underlying _id of the document, or possibly the entire document. These collections would be range sharded (by "foo"+"bar"), so that a query to a GSI would tend to only hit 1 replset initially. If you chose to store the entire document in the GSI, you'd be done. If you instead stored just _id, you'd need to query the underlying collection. If you end up returning 100's of documents you probably end up hitting all replset shards as before, but for the nReturned=0 or nReturned=1 case you'd only hit a single additional replset.

      Initially, you'd probably want GSIs to be eventually consistent, though with MongoDB's new transaction support you could conceivably make them more strongly consistent.

            Assignee:
            [DO NOT USE] Backlog - Cluster Scalability
            Reporter:
            David Bartley
            Votes:
            1 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: