[SERVER-46760] unrecognized field 'useNewUpsert' with sharded cluster running 4.2.1 Created: 09/Mar/20  Updated: 27/Oct/23  Resolved: 17/Mar/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alex Lepailleur Assignee: Carl Champain (Inactive)
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

It is not possible to run Collection.aggregate or Collection.count_document with pymongo against a sharded cluster. It returns

Traceback (most recent call last):
 File "/scripts/mongo_exporter.py", line 23, in <module>
 query_result = write_col.aggregate([
 File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 2375, in aggregate
 return self._aggregate(_CollectionAggregationCommand,
 File "/usr/local/lib/python3.8/site-packages/pymongo/collection.py", line 2297, in _aggregate
 return self.__database.client._retryable_read(
 File "/usr/local/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1464, in _retryable_read
 return func(session, server, sock_info, slave_ok)
 File "/usr/local/lib/python3.8/site-packages/pymongo/aggregation.py", line 136, in get_cursor
 result = sock_info.command(
 File "/usr/local/lib/python3.8/site-packages/pymongo/pool.py", line 603, in command
 return command(self.sock, dbname, spec, slave_ok,
 File "/usr/local/lib/python3.8/site-packages/pymongo/network.py", line 165, in command
 helpers._check_command_response(
 File "/usr/local/lib/python3.8/site-packages/pymongo/helpers.py", line 159, in _check_command_response
 raise OperationFailure(msg % errmsg, code, response)
pymongo.errors.OperationFailure: unrecognized field 'useNewUpsert'

Same query runs just fine against a non sharded collection with the same data in there.
 



 Comments   
Comment by Carl Champain (Inactive) [ 17/Mar/20 ]

Hi alex.lepailleur@gmail.com,

We really appreciate your interest. However, we are not going to pursue a reproduction of this issue since you are not encountering it in 4.2.3. I will now close this ticket and mark it as "gone away". 

Thank you!
Carl
 

Comment by Alex Lepailleur [ 12/Mar/20 ]

Hi Carl,

I guessed that I should look for mongos.log, but there was no log file at all to be found.
This was hindering our production so I had to push for an upgrade to 4.2.3. It works as intended, the aggregation works just fine, as expected. Unfortunately I won't be able to reproduce this anymore in my environment.
The configuration was very simple and you should be able to reproduce it. It was a single shard with 3 mongod in a replica set, a config replica set with 3 members, and a mongos running the official mongo:4.2.1 image. A keyfile was used for authentication between the different members.

Please let me know if I can be of any help. I could probably setup a temporary shard and config server using Docker to reproduce this if needed.

Thanks,
Alex

Comment by Carl Champain (Inactive) [ 12/Mar/20 ]

Hi alex.lepailleur@gmail.com,

Sorry, I misspoke in my previous comment! I meant mongos.log (not mongod.log). Can you find it?

Thanks,
Carl

Comment by Alex Lepailleur [ 12/Mar/20 ]

Hi Carl,

I ran that but don't have a mongod.log (nor a mongodb.log) in /var/log/mongodb. I don't use a config file so I'd expect it to use the default location for logging.
The only thing I have is this in the standard output of the mongos:

 

2020-03-12T10:38:57.811+0000 I ACCESS [conn161] Successfully authenticated as principal alepaill on admin from client 10.50.55.149:63477
 2020-03-12T10:39:00.352+0000 I COMMAND [conn161] command admin.$cmd appName: "MongoDB Shell" command: profile { profile: 0.0, slowms: -1.0, lsid:
 
{ id: UUID("77aeecd0-3376-49c2-b3aa-cc3dd1adad5f") }
 
, $clusterTime: { clusterTime: Timestamp(1584009537, 32), signature:
 
{ hash: BinData(0, 62C8FF2CE6ADEF12D4959C44C5C86084813E59D3), keyId: 6782553869273530395 }
 
}, $db: "admin" } numYields:0 reslen:204 protocol:op_msg 0ms
 2020-03-12T10:39:57.820+0000 I COMMAND [conn159] command admin.$cmd appName: "robo3t" command: ping
 
{ ping: 1, $db: "admin" }
 
numYields:0 reslen:163 protocol:op_msg 0ms
 2020-03-12T10:39:57.823+0000 I COMMAND [conn161] command admin.$cmd appName: "MongoDB Shell" command: ping { ping: 1.0, lsid:
 
{ id: UUID("77aeecd0-3376-49c2-b3aa-cc3dd1adad5f") }
 
, $clusterTime: { clusterTime: Timestamp(1584009537, 32), signature:
 
{ hash: BinData(0, 62C8FF2CE6ADEF12D4959C44C5C86084813E59D3), keyId: 6782553869273530395 }
 
}, $db: "admin" } numYields:0 reslen:163 protocol:op_msg 0ms
 2020-03-12T10:39:58.395+0000 I NETWORK [listener] connection accepted from 10.164.16.202:60196 #162 (5 connections now open)
 2020-03-12T10:39:58.395+0000 I NETWORK [conn162] received client metadata from 10.164.16.202:60196 conn162: { driver:
 
{ name: "PyMongo", version: "3.10.1" }
 
, os:
 
{ type: "Linux", name: "Linux", architecture: "x86_64", version: "4.14.33-bgrml64s" }
 
, platform: "CPython 3.8.0.final.0" }
 2020-03-12T10:39:58.395+0000 I COMMAND [conn162] command admin.$cmd command: isMaster { ismaster: 1, client: { driver:
 
{ name: "PyMongo", version: "3.10.1" }
 
, os:
 
{ type: "Linux", name: "Linux", architecture: "x86_64", version: "4.14.33-bgrml64s" }
 
, platform: "CPython 3.8.0.final.0" }, $db: "admin" } numYields:0 reslen:389 protocol:op_query 0ms
 2020-03-12T10:39:58.398+0000 I NETWORK [listener] connection accepted from 10.164.16.202:60198 #163 (6 connections now open)
 2020-03-12T10:39:58.398+0000 I NETWORK [conn163] received client metadata from 10.164.16.202:60198 conn163: { driver:
 
{ name: "PyMongo", version: "3.10.1" }
 
, os:
 
{ type: "Linux", name: "Linux", architecture: "x86_64", version: "4.14.33-bgrml64s" }
 
, platform: "CPython 3.8.0.final.0" }
 2020-03-12T10:39:58.398+0000 I COMMAND [conn163] command admin.$cmd command: isMaster { ismaster: 1, client: { driver:
 
{ name: "PyMongo", version: "3.10.1" }
 
, os:
 
{ type: "Linux", name: "Linux", architecture: "x86_64", version: "4.14.33-bgrml64s" }
 
, platform: "CPython 3.8.0.final.0" }, compression: [], $db: "admin" } numYields:0 reslen:389 protocol:op_query 0ms
 2020-03-12T10:39:58.400+0000 I COMMAND [conn163] command UdsCache.$cmd command: saslStart { saslStart: 1, mechanism: "SCRAM-SHA-1", payload: "xxx", autoAuthorize: 1, $db: "UdsCache", $readPreference:
 
{ mode: "primary" }
 
} numYields:0 reslen:317 protocol:op_msg 1ms
 2020-03-12T10:39:58.406+0000 I COMMAND [conn163] command UdsCache.$cmd command: saslContinue { saslContinue: 1, conversationId: 1, payload: "xxx", $db: "UdsCache", $readPreference:
 
{ mode: "primary" }
 
} numYields:0 reslen:234 protocol:op_msg 0ms
 2020-03-12T10:39:58.407+0000 I ACCESS [conn163] Successfully authenticated as principal udscacher on UdsCache from client 10.164.16.202:60198
 2020-03-12T10:39:58.407+0000 I COMMAND [conn163] command UdsCache.$cmd command: saslContinue { saslContinue: 1, conversationId: 1, payload: "xxx", $db: "UdsCache", $readPreference:
 
{ mode: "primary" }
 
} numYields:0 reslen:204 protocol:op_msg 0ms
 2020-03-12T10:39:58.408+0000 I COMMAND [conn163] command admin.$cmd command: buildInfo { buildinfo: 1, lsid:
 
{ id: UUID("2a3be116-95fe-4340-b6a0-231de5e8c90e") }
 
, $clusterTime: { clusterTime: Timestamp(1584009596, 1), signature:
 
{ hash: BinData(0, F2A5A31A4A651F8938C096F777614E649C132D34), keyId: 6782553869273530395 }
 
}, $db: "admin", $readPreference:
 
{ mode: "primary" }
 
} numYields:0 reslen:1367 protocol:op_msg 0ms
 2020-03-12T10:39:58.410+0000 I SH_REFR [ConfigServerCatalogCacheLoader-324] Refresh for database UdsCache from version {} to version
 
{ uuid: UUID("35038b09-7b28-4eba-87bc-dda51352aaa4"), lastMod: 1 }
 
took 1 ms
 2020-03-12T10:39:58.413+0000 I SH_REFR [ConfigServerCatalogCacheLoader-324] Refresh for collection UdsCache.datasets to version 1|1||5e208b42b827ad5fb9e0f7c4 took 1 ms
 2020-03-12T10:39:58.524+0000 I COMMAND [conn163] command UdsCache.datasets command: count { count: "datasets", query: { $or: [ { dataset:
 
{ $regex: "acorn:redhill::4divarsen:.*" }
 
} ] }, lsid:
 
{ id: UUID("2a3be116-95fe-4340-b6a0-231de5e8c90e") }
 
, $clusterTime: { clusterTime: Timestamp(1584009596, 1), signature:
 
{ hash: BinData(0, F2A5A31A4A651F8938C096F777614E649C132D34), keyId: 6782553869273530395 }
 
}, $db: "UdsCache", $readPreference:
 
{ mode: "primary" }
 
} numYields:0 reslen:201 protocol:op_msg 115ms
 2020-03-12T10:39:58.897+0000 I COMMAND [conn162] command admin.$cmd command: isMaster { ismaster: 1, $clusterTime: { clusterTime: Timestamp(1584009598, 11), signature:
 
{ hash: BinData(0, BBAB303EBF326FCB4774E26ABD70EB03E15180CE), keyId: 6782553869273530395 }
 
}, $db: "admin", $readPreference:
 
{ mode: "primary" }
 
} numYields:0 reslen:374 protocol:op_msg 0ms
 2020-03-12T10:39:59.097+0000 I COMMAND [conn163] none numYields:0 ok:0 errMsg:"unrecognized field 'useNewUpsert'" errName:FailedToParse errCode:9 reslen:247 571ms
 2020-03-12T10:39:59.401+0000 I NETWORK [conn163] end connection 10.164.16.202:60198 (5 connections now open)
 2020-03-12T10:39:59.401+0000 I NETWORK [conn162] end connection 10.164.16.202:60196 (4 connections now open)
 2020-03-12T10:39:59.841+0000 I COMMAND [conn152] command admin.$cmd appName: "robo3t" command: ping
 
{ ping: 1, $db: "admin" }
 
numYields:0 reslen:163 protocol:op_msg 0ms
 2020-03-12T10:39:59.844+0000 I COMMAND [conn154] command admin.$cmd appName: "MongoDB Shell" command: ping { ping: 1.0, lsid:
 
{ id: UUID("97d802ed-7e7b-45d3-a617-3d338e3107a4") }
 
, $clusterTime: { clusterTime: Timestamp(1584009537, 32), signature:
 
{ hash: BinData(0, 62C8FF2CE6ADEF12D4959C44C5C86084813E59D3), keyId: 6782553869273530395 }
 
}, $db: "admin" } numYields:0 reslen:163 protocol:op_msg 0ms
 
 

Is this what you wanted?
Thanks,
Alex

Comment by Carl Champain (Inactive) [ 11/Mar/20 ]

alex.lepailleur@gmail.com,

To help us understand what is happening, can you please

  1. Run the following command on your mongoS

    db.adminCommand({profile: 0, slowms: -1})
    

  2. Run the aggregation again
  3. Provide the mongod.log file from the mongoS

Once the three steps are completed, you can go back to the default log level by running

db.adminCommand({profile: 0, slowms: 100})

Kind regards,
Carl

Comment by Bernie Hackett [ 10/Mar/20 ]

I've been told by the query team that useNewUpsert does not exist in the 4.2.1 codebase. They can't explain why you would still be getting this error. I'm going to move this ticket to the SERVER project for further discussion. This is definitely not a Python problem, since the Python driver does not set this field.

Comment by Alex Lepailleur [ 10/Mar/20 ]

Hello,
after creating a MongoClient to my mongos, it shows 4.2.1
Thanks,
Alex

Comment by Bernie Hackett [ 09/Mar/20 ]

Can you have you script log the server version? In PyMongo you can do:

MongoClient.server_info()["version"]

Comment by Alex Lepailleur [ 09/Mar/20 ]

Hello,

Thanks for the answer.
I'm not sure I understand. I actually did not upgrade anything. The shard and config servers have always been running 4.2.1.
We just runs mongos instances in Docker and bring them up and down according to our needs. It happens that the image we were using for that mongos was 4.2.3. I brought it down completely and started a new mongos with version 4.2.1 and the issue was solved when running aggregations by other means than PyMongo (robo3t for instance) but is still occurring when running aggregations with PyMongo, even though everything runs now using 4.2.1, including the mongoS.

Best regards,
Alex

Comment by Bernard Gorman [ 09/Mar/20 ]

Hi alex.lepailleur@gmail.com,

Many thanks for bringing this issue to our attention. This is not a problem with the Python driver, but is a result of running a version of mongoS on your cluster that is more recent that the version running on your shards.

The only supported path for upgrading a sharded cluster, whether for a minor-version upgrade (e.g. from 4.2.1 to 4.2.2) or a major-version upgrade (e.g. from 4.0 to 4.2) is to upgrade the config servers first, then the shards, and finally the mongoS. When we are writing the code necessary to transition smoothly between versions, we base our decisions on the knowledge that a mongoS should never need to communicate with shards which have not yet been upgraded. In fact, if you attempt to upgrade the mongoS first when upgrading between major versions, it will actually refuse to start until the config servers and all shards are upgraded first.

We aren't quite that strict when upgrading between minor versions, but it is still true that your cluster should never be running a mongoS with a later version than your shards. In this particular case, we backported a major improvement to 4.2.2 which fundamentally changes the way certain aggregation pipelines operate. Because the mongoS should always be upgraded last, it sends the useNewUpsert flag to the shards - which should all be on 4.2.2 at this point - to let them know that all shards have been upgraded, and that it is safe to switch over to the new mechanism. If you are running a 4.2.2 mongoS with 4.2.1 shards, then the shards will not understand this flag, and will throw the exception you observed.

As such, I would recommend that you either upgrade your entire cluster to 4.2.3, or stick with mongoS 4.2.1 until such time as the rest of the cluster can be upgraded as well.

Thanks again for taking the time to file this report!

Best regards,
Bernard

Comment by Bernie Hackett [ 09/Mar/20 ]

I've discussed this with the query team. The upgrade path for a sharded cluster includes upgrading the mongos instances last, not first. There is a detailed explanation here. To solve this problem you have to upgrade the config servers and shards.

Comment by Alex Lepailleur [ 09/Mar/20 ]

Yes, any aggregate basically

```
client = MongoClient('hostname', port=27017)
db = client.SomeDatabase
col = db.SomeCollection

col.aggregate([ { '$match':

{'x': y}

} ])
```

Comment by Bernie Hackett [ 09/Mar/20 ]

I would guess there is some difference in the commands being sent. Can you post a code example here that causes the command to fail with PyMongo?

Comment by Alex Lepailleur [ 09/Mar/20 ]

Thanks for the answer. I am planning to upgrade it but it's kind of outside of my control and need to wait for somebody else to do it so cannot test it immediately sadly. But it's definitely something which happened between 4.2.1 and 4.2.2 of mongodb.

https://github.com/mongodb/mongo/blob/r4.2.2/src/mongo/db/pipeline/aggregation_request.h
It has useNewUpsert

https://github.com/mongodb/mongo/blob/r4.2.1/src/mongo/db/pipeline/aggregation_request.h
It does not have it

However, when connecting to my 4.2.1 mongos using robo3t, I can run aggregations without any issues while PyMongo fails with this exception. Any idea what might be different?

Comment by Bernie Hackett [ 09/Mar/20 ]

Can you upgrade your cluster to 4.2.2? PyMongo doesn't use useNewUpsert. I'd never heard of it before now and it's not in our codebase anywhere. I'm guessing this is a bug in the server. SERVER-44884 mentions it, linking to SERVER-43860 which is marked as fixed in MongoDB 4.2.2.

Comment by Alex Lepailleur [ 09/Mar/20 ]

I realized robo3t returns the same when I run a mongos 4.2.3 to proxy a shard running mongod 4.2.1
The problem goes away with robo3t if I run mongos 4.2.1
However it persists with PYMongo

Comment by Alex Lepailleur [ 09/Mar/20 ]

Just wanted to add that I tried with 3.5 up to 3.10.1 and the log pasted here is for 3.10.1

Generated at Thu Feb 08 05:12:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.