[SERVER-85229] The same distinct command, the result sometimes contains null and sometimes does not Created: 16/Jan/24  Updated: 23/Jan/24  Resolved: 23/Jan/24

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.8, 6.0.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: 志刚 柴 Assignee: Backlog - Query Execution
Resolution: Duplicate Votes: 0
Labels: Bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-14832 Distinct command result set may inclu... Needs Scheduling
Assigned Teams:
Query Execution
Operating System: ALL
Participants:

 Description   

version:6.0.12 4.2.8

db.t1.insert({id:1})
db.t1.insert({id:2})
db.t1.insert({})
db.t1.insert({id:2})

db.t1.find()

{ “_id” : ObjectId(“65a62118842d8bddd7ef9699”), “id” : 1 }
{ “_id” : ObjectId(“65a62119842d8bddd7ef969a”), “id” : 2 }
{ “_id” : ObjectId(“65a6211c842d8bddd7ef969b”) }
{ “_id” : ObjectId(“65a6211f842d8bddd7ef969c”), “id” : 2 }

— There are no nulls in the result set

db.t1.distinct(‘id’)
[ 1, 2 ]

– After creating the index, there are nulls in the result set

db.t1.createIndex(\{id:1})
{
“numIndexesBefore” : 1,
“numIndexesAfter” : 2,
“createdCollectionAutomatically” : false,
“ok” : 1
}

db.t1.distinct(‘id’)
[ null, 1, 2 ]

After adding the index, null values appeared in the result set, causing the application processing logic to change. I would like to ask if this is a bug.



 Comments   
Comment by Kyle Suarez [ 23/Jan/24 ]

Hello chaigang@126.com,

Thank you for your report. This is a known issue and is tracked in SERVER-14832. Any fixes will be tracked in that ticket.

Comment by Chris Kelly [ 17/Jan/24 ]

Hi chaigang@126.com,

Thanks for your report. Some initial comments:

  • Initially, this does seem a little odd that the output changes before/after indexing here.
    • This could theoretically impact user application processing logic to change depending on how these results are being used.
  • MongoDB provides sparse indexes which:
    • Only contain entries for documents that have the indexed field, even if the index field contains a null value.
    • The index skips over any document that is missing the indexed field. 
  • Sparse indexes behave as you are expecting in this case, and only output [1, 2]:

    db.t1.createIndex({id:1}, {sparse: true})
    db.t1.find()
    replset:PRIMARY> db.t1.distinct('id')
    [ 1, 2 ]
    replset:PRIMARY> db.t1.find()
    { "_id" : ObjectId("65a816ab50b738bcd5b41dc3"), "id" : 1 }
    { "_id" : ObjectId("65a816af50b738bcd5b41dc4"), "id" : 2 }
    { "_id" : ObjectId("65a816b550b738bcd5b41dc5") }
    { "_id" : ObjectId("65a816bc50b738bcd5b41dc6"), "id" : 2 }
    

I will forward this to the relevant team to confirm whether this is intended behavior.

Comment by 志刚 柴 [ 16/Jan/24 ]

db.t1.find()

{ “_id” : ObjectId(“65a62118842d8bddd7ef9699”), “id” : 1 } { “_id” : ObjectId(“65a62119842d8bddd7ef969a”), “id” : 2 } { “_id” : ObjectId(“65a6211c842d8bddd7ef969b”) } { “_id” : ObjectId(“65a6211f842d8bddd7ef969c”), “id” : 2 }
Generated at Thu Feb 08 06:57:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.