[SERVER-61769] Attempting to run an aggregation with $out or $merge in a transaction on a sharded cluster leaves idle cursors open Created: 29/Nov/21  Updated: 29/Oct/23  Resolved: 25/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.14, 5.0.7, 5.3.0-rc2, 5.3.0-rc3

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Jennifer Peshansky (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File server-61769-repro.patch    
Issue Links:
Backports
Depends
is depended on by SERVER-43099 Reenable random chunk migration failp... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.3, v5.2, v5.1, v5.0, v4.4, v4.2
Steps To Reproduce:

server-61769-repro.patch
./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/server-61769.js --log=file

Sprint: QE 2021-12-27, QE 2022-01-10, QE 2022-02-07, QE 2022-02-21, QE 2022-03-07, QE 2022-01-24
Participants:

 Description   

The $out and $merge aggregation pipeline stages are not allowed in transaction. Attempting to do so will fail with OperationNotSupportedInTransaction. On sharded collections, when the client attempts to run $out or $merge in a transaction, it will fail with OperationNotSupportedInTransaction as expected, but it is leaving idle cursors open.

Repro test attached.



 Comments   
Comment by Githook User [ 01/Mar/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx

(cherry picked from commit 9b4519e5436a970b90a966cdc16dd8129833a9ae)
Branch: v4.4
https://github.com/mongodb/mongo/commit/ee6fda4258a4a73fdd74c2b40cac4bad5d1fb7cf

Comment by Githook User [ 01/Mar/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx

(cherry picked from commit 3952be624b21927373df31df4f0d9dad793abedd)
Branch: v4.4
https://github.com/10gen/mongo-enterprise-modules/commit/9d5cf22d8f960ad5e6b91339f40830bdfc608d6d

Comment by Githook User [ 01/Mar/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx

(cherry picked from commit 9b4519e5436a970b90a966cdc16dd8129833a9ae)
Branch: v5.3
https://github.com/mongodb/mongo/commit/129289774be78f67fc03bfcf2e358118263d5da1

Comment by Githook User [ 01/Mar/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx

(cherry picked from commit 9b4519e5436a970b90a966cdc16dd8129833a9ae)
Branch: v5.0
https://github.com/mongodb/mongo/commit/764a211fe72524ff16f30e0e9bdfac9063161696

Comment by Githook User [ 01/Mar/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx

(cherry picked from commit 3952be624b21927373df31df4f0d9dad793abedd)
Branch: v5.3
https://github.com/10gen/mongo-enterprise-modules/commit/91755490103ade7ef52bb14c0605376dd4e25ec1

Comment by Githook User [ 01/Mar/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx

(cherry picked from commit 3952be624b21927373df31df4f0d9dad793abedd)
Branch: v5.0
https://github.com/10gen/mongo-enterprise-modules/commit/d8730dd856cf2c38d925532bec8d2268e357023a

Comment by Githook User [ 25/Feb/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx
Branch: master
https://github.com/mongodb/mongo/commit/ea9494f858b47088343198fe9a888d2d1dea1823

Comment by Githook User [ 25/Feb/22 ]

Author:

{'name': 'Jennifer Peshansky', 'email': 'jennifer.peshansky@mongodb.com', 'username': 'jenniferpeshansky'}

Message: SERVER-61769 Remove inMultiDocumentTransaction from expCtx
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/b6a8473595ff23e7c78c3a28d3b85db622355f8f

Comment by Jennifer Peshansky (Inactive) [ 17/Feb/22 ]

Looking at the logs, the only difference between the passing and failing case, between when the session starts and when the idle cursors are found or not found, are these two lines in the passing case:

[js_test:server-61769] d20021| {"t":{"$date":"2022-02-17T22:27:47.260+00:00"},"s":"I",  "c":"NETWORK",  "id":22943,   "ctx":"listener","msg":"Connection accepted","attr":{"remote":"10.122.15.174:45464","uuid":"5e3891e1-d171-4fc6-b752-39719a8b446d","connectionId":37,"connectionCount":16}}
[js_test:server-61769] d20021| {"t":{"$date":"2022-02-17T22:27:47.260+00:00"},"s":"I",  "c":"NETWORK",  "id":51800,   "ctx":"conn37","msg":"client metadata","attr":{"remote":"10.122.15.174:45464","client":"conn37","doc":{"driver":{"name":"NetworkInterfaceTL-TaskExecutorPool-0","version":"5.3.0-alpha4-85-gd53e5f0"},"os":{"type":"Linux","name":"Ubuntu","architecture":"x86_64","version":"20.04"}}}}

Comment by Jennifer Peshansky (Inactive) [ 17/Feb/22 ]

[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.044+00:00"},"s":"D3", "c":"TXN",      "id":22889,   "ctx":"conn12","msg":"New transaction started","attr":{"sessionId":{"uuid":{"$uuid":"6dc74a18-903c-4012-8376-9cc784fd872d"}},"txnNumber":0,"txnRetryCounter":0}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.044+00:00"},"s":"D1", "c":"SH_REFR",  "id":4619900, "ctx":"CatalogCache-1","msg":"Refreshing cached collection","attr":{"namespace":"test.fooOut","lookupSinceVersion":"0|0||000000000000000000000000||Timestamp(0, 0)","timeInStore":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0}}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.045+00:00"},"s":"D1", "c":"ASSERT",   "id":23074,   "ctx":"ConfigServerCatalogCacheLoader::getChunksSince","msg":"User assertion","attr":{"error":"NamespaceNotFound: Collection test.fooOut not found","file":"src/mongo/s/catalog/sharding_catalog_client_impl.cpp","line":683}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.046+00:00"},"s":"D1", "c":"ASSERT",   "id":23074,   "ctx":"CatalogCache-1","msg":"User assertion","attr":{"error":"NamespaceNotFound: Collection test.fooOut not found","file":"src/mongo/util/future_impl.h","line":673}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.046+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619902, "ctx":"CatalogCache-1","msg":"Collection has found to be unsharded after refresh","attr":{"namespace":"test.fooOut","durationMillis":1}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.046+00:00"},"s":"D1", "c":"SH_REFR",  "id":4619900, "ctx":"CatalogCache-0","msg":"Refreshing cached collection","attr":{"namespace":"test.foo","lookupSinceVersion":"0|0||000000000000000000000000||Timestamp(0, 0)","timeInStore":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0}}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.048+00:00"},"s":"I",  "c":"SH_REFR",  "id":4619901, "ctx":"CatalogCache-0","msg":"Refreshed cached collection","attr":{"namespace":"test.foo","lookupSinceVersion":"0|0||000000000000000000000000||Timestamp(0, 0)","newVersion":{"chunkVersion":[{"$timestamp":{"t":1,"i":3}},{"$oid":"620ec0af6d47b221774cf3a2"},{"$timestamp":{"t":1645133999,"i":69}}],"forcedRefreshSequenceNum":1,"epochDisambiguatingSequenceNum":5},"timeInStore":{"chunkVersion":"None","forcedRefreshSequenceNum":0,"epochDisambiguatingSequenceNum":0},"durationMillis":2}}
[js_test:server-61769] s20025| {"t":{"$date":"2022-02-17T21:40:00.048+00:00"},"s":"D1", "c":"QUERY",    "id":20904,   "ctx":"conn12","msg":"Dispatching command {cmdObj} to establish cursors on shards","attr":{"cmdObj":{"aggregate":"foo","pipeline":[],"cursor":{"batchSize":0},"needsMerge":true,"let":{"NOW":{"$date":"2022-02-17T21:40:00.048Z"},"CLUSTER_TIME":{"$timestamp":{"t":1645134000,"i":16}}},"fromMongos":true,"collation":{"locale":"simple"},"txnNumber":0,"readConcern":{"level":"local","provenance":"implicitDefault"}}}}

Comment by Jennifer Peshansky (Inactive) [ 17/Feb/22 ]

Some notes:

  • The repro worked
  • When I changed the number of shards to 1 in the repro, there were no longer idle cursors
  • When I tried to trigger a different uassert in DocumentSourceOut::create (while still in the transaction), there were no longer idle cursors

So it seems this only repros when the uassert condition it fails is, specifically, !expCtx->inMultiDocumentTransaction.

Comment by Jordi Serra Torrens [ 30/Nov/21 ]

Thanks kyle.suarez. I filed this ticket because it is preventing us to make the concurrency_sharded_*_with_balancer suites actually do some moveChunks again (SERVER-43099). For what it concerns SERVER-43099, we can just temporarily either not run jstests/concurrency/fsm_workloads/agg_sort.js and jstests/concurrency/fsm_workloads/agg_out.js on the _with_balancer suites, or not let them attempt to do their aggregations in a txn (which would fail and be retried anyway) until this ticket is fixed. So not urgent at all in this regard.
I can't comment on the product impact of this bug though.

Comment by Kyle Suarez [ 30/Nov/21 ]

jordi.serra-torrens, we're scheduling this fix, but how high is the priority to get this over the line? We have Skunkworks and then the holidays coming up, so this may slip.

Generated at Thu Feb 08 05:53:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.