Core Server
  1. Core Server
  2. SERVER-8772

Documents with null value for hashed shard key are not returned via mongos

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Critical - P2 Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 2.4.0-rc1
    • Fix Version/s: 2.4.0-rc2
    • Component/s: Sharding
    • Labels:
      None
    • Environment:
      OS X, 3 shards on 3 replica sets, 1 mongos
    • Backport:
      No
    • Operating System:
      ALL
    • Bug Type:
      Unknown
    • # Replies:
      4
    • Last comment by Customer:
      true
    • Steps To Reproduce:
      Hide

      Import the 2009 NYS campaign finance database http://www.elections.ny.gov/NYSBOE/download/ZipDataFiles/2009gen.zip using mongoimport -d qa -c gen2009 -fieldFile headers.h --ignoreBlanks --type csv 2009gen.out (note that headers.h is attached to this ticket)
      mongo
      > use qa
      > db.gen2009.ensureIndex(

      {CITY:"hashed"}

      )
      > db.gen2009.find(

      {CITY:null}

      ) should return 7463 documents
      > sh.enableSharding("qa")
      > sh.shardCollection("qa.gen2009",

      {CITY:"hashed"}

      )
      > db.gen2009.find(

      {CITY:null}

      ) -returns no documents
      > db.gen2009.find(

      {CITY:null}

      ).count() returns 7463
      > db.gen2009.find(

      {STATE:null}

      ).count() returns 7322 documents

      Show
      Import the 2009 NYS campaign finance database http://www.elections.ny.gov/NYSBOE/download/ZipDataFiles/2009gen.zip using mongoimport -d qa -c gen2009 -fieldFile headers.h --ignoreBlanks --type csv 2009gen.out (note that headers.h is attached to this ticket) mongo > use qa > db.gen2009.ensureIndex( {CITY:"hashed"} ) > db.gen2009.find( {CITY:null} ) should return 7463 documents > sh.enableSharding("qa") > sh.shardCollection("qa.gen2009", {CITY:"hashed"} ) > db.gen2009.find( {CITY:null} ) -returns no documents > db.gen2009.find( {CITY:null} ).count() returns 7463 > db.gen2009.find( {STATE:null} ).count() returns 7322 documents

      Description

      I have a collection of 100k+ documents where about 7% are missing the field used for a hashed shard key. While you cannot shard a collection where some documents are missing the shard key field, you can if you used a hashed shard key.

      Prior to sharding, queries on the field (using

      {CITY: null}

      ) succeed, after sharding they appear to fail. The query is directed to the shards and they appear to process it, but mongos does not return any documents. Only happens if the field with null values is the shard key.

      Have reproduced with mongo shell and pymongo, have not narrowed down enough to write JS test case.

      Am not seeing any obvious errors in my mongos or mongod logs.

      1. headers.h
        0.3 kB
        Ed Costello
      2. missing_key.js
        1.0 kB
        Dan Pasette

        Activity

        Hide
        Aaron Staple (Inactive)
        added a comment - - edited

        This is not code I'm super familiar with, but it looks like running the shardCollection command on mongos sends out checkShardingIndex commands to the mongods. And checkShardingIndex checks for index keys where the shard key is null, indicating that a shard key field may be absent from a document, from CheckShardingIndex::run():

                            if ( currKeyElt.type() && currKeyElt.type() != jstNULL )
                                continue;
        

        Hash indexes don't store a key of null for a missing field, but instead they store the hash of null. Missing values cannot be identified by the presence of null index keys in the current hash index implementation.

        Show
        Aaron Staple (Inactive)
        added a comment - - edited This is not code I'm super familiar with, but it looks like running the shardCollection command on mongos sends out checkShardingIndex commands to the mongods. And checkShardingIndex checks for index keys where the shard key is null, indicating that a shard key field may be absent from a document, from CheckShardingIndex::run(): if ( currKeyElt.type() && currKeyElt.type() != jstNULL ) continue; Hash indexes don't store a key of null for a missing field, but instead they store the hash of null. Missing values cannot be identified by the presence of null index keys in the current hash index implementation.
        Hide
        Dan Pasette
        added a comment -

        Adding a test case for sharding on existing collection using single, compound and hashed shard key.

        Can special case hashed shard keys and check for hashed value of null instead.

        Show
        Dan Pasette
        added a comment - Adding a test case for sharding on existing collection using single, compound and hashed shard key. Can special case hashed shard keys and check for hashed value of null instead.
        Hide
        Aaron Staple (Inactive)
        added a comment -

        commit 696dec1262372b0ac45bad9c84de4700eb0d2e71
        Author: aaron <aaron@10gen.com>
        Date: Sun Mar 3 11:38:44 2013 -0500

        Make the IndexSpec::missingField() implementation IndexType specific, and use missingField() to properly identify missing fields in CheckShardingIndex::run().

        Show
        Aaron Staple (Inactive)
        added a comment - commit 696dec1262372b0ac45bad9c84de4700eb0d2e71 Author: aaron <aaron@10gen.com> Date: Sun Mar 3 11:38:44 2013 -0500 Make the IndexSpec::missingField() implementation IndexType specific, and use missingField() to properly identify missing fields in CheckShardingIndex::run().
        Hide
        Ed Costello (Inactive)
        added a comment -

        Am retesting this afternoon (6 March)

        Show
        Ed Costello (Inactive)
        added a comment - Am retesting this afternoon (6 March)

          People

          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:
              Days since reply:
              1 year, 7 weeks ago
              Date of 1st Reply: