[SERVER-48128] mapreduce and aggregation with output don't work on rs to cluster upgrade Created: 12/May/20 Updated: 29/Oct/23 Resolved: 27/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 4.5.1 |
| Fix Version/s: | 4.7.0, 4.4.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marcos José Grillo Ramirez | Assignee: | Bernard Gorman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | qexec-team | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||
| Steps To Reproduce: |
|
||||||||||||
| Sprint: | Query 2020-06-01, Query 2020-06-15, Query 2020-06-29, Query 2020-07-13, Query 2020-07-27, Query 2020-08-10 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
DOCSP-10021 describe the basic steps followed on Atlas to upgrade from a replica set to a sharded cluster. SERVER-47701 adds a test to ensure this process works, however, the mapreduce and aggregation commands with output don't work when connected to the primary MongoD directly, they fail with a similar error:
And it looks like it fails while doing a listCollections:
With the following stacktrace:
|
| Comments |
| Comment by Githook User [ 11/Sep/20 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: (cherry picked from commit 64c7ccfac9ae6b4765481d6158e6447a69b2914b) |
| Comment by Githook User [ 27/Jul/20 ] |
|
Author: {'name': 'Bernard Gorman', 'email': 'bernard.gorman@gmail.com', 'username': 'gormanb'}Message: |
| Comment by Arun Banala [ 16/Jun/20 ] |
|
The issue here is, the aggregation request makes an internal request for listCollections as part of $out stage. We append a dbVersion to this request. The listCollections command validates the dbVersion received in the input, against the dbVersion present in cache (DatabaseShardingState). If there is a mismatch, it throws an error which propagates all the way to the client. One possible fix is to treat the requests sent directly to a node as un-versioned. We could attach the dbVersion to the internal commands only when the client is mongos. We need to fix this issue in all the previous version as well since this is part of the Atlas upgrade from Replica set workflow. I've tested the workflow on 4.2 and the aggregate $out command doesn't fail there. So this seems to be an issue only on 4.4 and master. |
| Comment by Kaloian Manassiev [ 12/May/20 ] |
|
Just a heads-up that in this case, these are direct writes to a shard, so there should not be StaleDb/ShardVersion being thrown at all (hence nothing to be retried). So likely they are not duplicates. |
| Comment by Craig Homa [ 12/May/20 ] |
|
Hey Arun, this looks like it is related to |