[SERVER-51207] $merge not working without shard-key Created: 29/Sep/20  Updated: 30/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Wernfried Domscheit Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: aggregation-framework, merge, qexec-team, query-product-scope-1, query-product-urgency-2, query-product-value-2, sharding
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mongo 4.4


Assigned Teams:
Query Execution
Operating System: ALL
Sprint: Query 2020-11-30, Query 2020-12-14, Query 2020-12-28, Query 2021-01-11, Query 2021-01-25, Query Execution 2021-02-22, Query Execution 2021-03-08, Query Execution 2021-03-22, Query Execution 2021-04-05, Query Execution 2021-04-19, Query Execution 2021-05-03, Query Execution 2021-05-17, Query Execution 2021-05-31, Query Execution 2021-06-14, Query Execution 2021-06-28, Query Execution 2021-07-12, Query Execution 2021-07-26, QE 2021-08-09, QE 2023-08-21, QE 2023-09-04, QE 2023-09-18, QE 2023-10-02, QE 2023-10-16, QE 2023-10-30
Participants:

 Description   

In Mongo 4.4, shard keys can be missing, see https://docs.mongodb.com/manual/core/sharding-shard-key/#shard-key-missing

This works fine for insert of documents, however merge in aggregation-framework is failing:

db = db.getSiblingDB("mip")
db.createCollection("sharded_col")
sh.enableSharding("mip")
sh.shardCollection("mip.sharded_col", { tsi: 1 })
 
db.sharded_col.insertOne({ a: 1 })
 
db.sharded_col.aggregate([
   { $set: { a: 2 } },
   { $unset: "_id" },
   { $merge: { into: "sharded_col" } }
])
 
{ 
    "ok" : 0.0, 
    "errmsg" : "$merge write error: 'on' field 'tsi' cannot be missing, null, undefined or an array", 
    "code" : NumberInt(51132), 
    "codeName" : "Location51132", 
    "operationTime" : Timestamp(1601372348, 4), 
    "$clusterTime" : {
        "clusterTime" : Timestamp(1601372351, 40), 
        "signature" : {
            "hash" : BinData(0, "wv5eJ/xcJoaS1m14aJeoSA3WSqo="), 
            "keyId" : NumberLong(6854861305854033921)
        }
    }
}

 

If MongoDB permits missing shard key then it should also apply for merge. 



 Comments   
Comment by Foteini Alvanaki [ 18/Aug/23 ]

I did not have time to work on this yet. I am putting it back to open.

Comment by Xiaochen Wu [ 22/Mar/23 ]

Suggest to add to quick win candidate.

Comment by Edwin Zhou [ 01/Oct/20 ]

wernfried.domscheit@sunrise.net thank you for the reproduction you provided and additional context about your use-case. I was able to reproduce the error and will hand it over to the sharding team for further investigation.

 

Comment by Wernfried Domscheit [ 30/Sep/20 ]

My use case it to group documents in order to save space. It looks like this one:
 

db.sharded_col.insertMany([
   { a: 1, grouped: false, b: "foo" },
   { a: 1, grouped: false, b: "bar" },
   { a: 1, grouped: false, b: "jeff" }
   { a: 2, grouped: false, tsi: 123, b: "foo-foo" },
   { a: 2, grouped: false, tsi: 123, b: "bar-bar" },
])
 
db.sharded_col.aggregate([
   { $match: { grouped: false } },
   {
      $group: {
         _id: { a: "$a", tsi: "$tsi" },
         b: { $push: "$b" },
         ids: { $push: "$_id" }
      }
   },
   { $set: { grouped: true, a: "$_id.a", tsi: "$_id.tsi" } },
   { $unset: "_id" },
   { $merge: { into: "sharded_col" } }
])
 
db.sharded_col.find({ grouped: true }).forEach(function (row) {
   db.sharded_col.deleteMany({ _id: { $in: row.ids } });
})
db.sharded_col.updateMany({ grouped: true }, { $unset: { _ids: '' } })

The vast majority of documents have a shard key, of course. However, there are a few documents where shard key does not exits.

My current workaround is to use a dummy shard-key: `tsi: false`
The documents are log messages which are inserted by an external program, one-by-one. It is not possible to inserted them already as groups.
The grouping is done by a job once per hour. I like to have grouped and un-groupped documents in the same collection.

 

Comment by Asya Kamsky [ 29/Sep/20 ]

wernfried.domscheit@sunrise.net what's the use case for this?   Usually $merge would be used to update existing documents based on some calculation - is the main use case here updating documents that happen to have null shard keys?  Or using $merge to insert a document or documents that have null shard key? 

 

Comment by Wernfried Domscheit [ 29/Sep/20 ]

With proper line break:

 

db = db.getSiblingDB("mip")
db.createCollection("sharded_col")
sh.enableSharding("mip")
sh.shardCollection("mip.sharded_col", { tsi: 1 })
 
db.sharded_col.insertOne({ a: 1 })
 
db.sharded_col.aggregate([
   { $set: { a: 2 } },
   { $unset: "_id" },
   { $merge: { into: "sharded_col" } }
])
 
{ 
    "ok" : 0.0, 
    "errmsg" : "$merge write error: 'on' field 'tsi' cannot be missing, null, undefined or an array", 
    "code" : NumberInt(51132), 
    "codeName" : "Location51132", 
    "operationTime" : Timestamp(1601372348, 4), 
    "$clusterTime" : {
        "clusterTime" : Timestamp(1601372351, 40), 
        "signature" : {
            "hash" : BinData(0, "wv5eJ/xcJoaS1m14aJeoSA3WSqo="), 
            "keyId" : NumberLong(6854861305854033921)
        }
    }
}

Generated at Thu Feb 08 05:24:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.