[SERVER-46679] Transaction test fails with Stale Shard Version error on 4.5 Created: 06/Mar/20  Updated: 29/Oct/23  Resolved: 31/Mar/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.4.0-rc6, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Marcos José Grillo Ramirez
Resolution: Fixed Votes: 0
Labels: PM-1645-Milestone-1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive db.zip    
Issue Links:
Backports
Depends
is depended on by SERVER-32198 Missing collection metadata on the sh... Closed
is depended on by CDRIVER-3564 Transactions test in /Samples test fr... Closed
is depended on by PYTHON-2154 test_examples.TestTransactionExamples... Closed
Related
related to SERVER-74195 Transaction failed after version upgrade Closed
related to PYTHON-2154 test_examples.TestTransactionExamples... Closed
related to SERVER-45779 Throw Stale Shard Version when collec... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Sharding 2020-04-06
Participants:
Linked BF Score: 15

 Description   

A change in SERVER-45779 (server version 4.5.0-34-g45341f9) is causing one of Python's transactions documentation test to fail on sharded clusters with the following error (PYTHON-2154):
"Transaction 58ec29bc-f257-43ce-81ed-2db10a7da770:3 was aborted on statement 0 due to: an error from cluster data placement change :: caused by :: Encountered error from localhost:27217 during a transaction :: caused by :: sharding status of collection hr.employees is not currently available for filtering and needs to be recovered from the config server"

Here's a link to source for the failing "update_employee_info" transaction test: https://github.com/mongodb/mongo-python-driver/blob/1fdfb9864b3c0614f0f46add0bb7a25ebb8d2e77/test/test_examples.py#L902

The failure happens on the first command in the transaction on line 875.

The test does not shard the collection or enable sharding on the database so hr.employees is an unsharded collection.

Here are the mongos logs for the failing test:

2020-03-05T21:58:17.855+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:42732 #685 (19 connections now open)
2020-03-05T21:58:17.857+0000 I  NETWORK  [conn685] received client metadata from 127.0.0.1:42732 conn685: {"driver":{"name":"PyMongo","version":"3.11.0.dev1"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux Server 6.2 Santiago","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 2.7.17.final.0"}
2020-03-05T21:58:17.860+0000 I  NETWORK  [listener] connection accepted from 127.0.0.1:42733 #686 (20 connections now open)
2020-03-05T21:58:17.861+0000 I  NETWORK  [conn686] received client metadata from 127.0.0.1:42733 conn686: {"driver":{"name":"PyMongo","version":"3.11.0.dev1"},"os":{"type":"Linux","name":"Red Hat Enterprise Linux Server 6.2 Santiago","architecture":"x86_64","version":"2.6.32-220.el6.x86_64"},"platform":"CPython 2.7.17.final.0"}
2020-03-05T21:58:17.862+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
2020-03-05T21:58:17.863+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "primary" }
2020-03-05T21:58:17.864+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database hr took 1 ms and failed :: caused by :: NamespaceNotFound: database hr not found
2020-03-05T21:58:17.864+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
2020-03-05T21:58:17.875+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
2020-03-05T21:58:17.876+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database hr from version {} to version {"uuid":{"$uuid":"4f3f02c9-58eb-41fa-978c-8c8f78cb99c8"},"lastMod":1} took 0 ms
2020-03-05T21:58:17.876+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "nearest" }
2020-03-05T21:58:17.890+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
2020-03-05T21:58:17.891+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "primary" }
2020-03-05T21:58:17.892+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database reporting took 1 ms and failed :: caused by :: NamespaceNotFound: database reporting not found
2020-03-05T21:58:17.892+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
2020-03-05T21:58:17.904+0000 I  NETWORK  [ConfigServerCatalogCacheLoader-0] ignoring minOpTime for { mode: "nearest" }
2020-03-05T21:58:17.905+0000 I  SH_REFR  [ConfigServerCatalogCacheLoader-0] Refresh for database reporting from version {} to version {"uuid":{"$uuid":"89d001c7-f571-47cf-885e-5e7e4d682360"},"lastMod":1} took 0 ms
2020-03-05T21:58:17.905+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "nearest" }
2020-03-05T21:58:17.933+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
2020-03-05T21:58:18.039+0000 I  -        [conn686] slow query{"type":"command","ns":"reporting.$cmd","command":{"dropDatabase":1,"writeConcern":{"w":"majority"},"lsid":{"id":{"$uuid":"05f6af3a-f73b-472e-8924-1517ee7094f4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1583445497,"i":167}},"signature":{"keyId":0,"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}}}},"$db":"reporting","$readPreference":{"mode":"primary"}},"numYields":0,"reslen":186,"protocol":"op_msg","durationMillis":106}
2020-03-05T21:58:18.041+0000 I  NETWORK  [conn686] ignoring minOpTime for { mode: "primary" }
2020-03-05T21:58:18.142+0000 I  -        [conn686] slow query{"type":"command","ns":"hr.$cmd","command":{"dropDatabase":1,"writeConcern":{"w":"majority"},"lsid":{"id":{"$uuid":"05f6af3a-f73b-472e-8924-1517ee7094f4"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1583445498,"i":5}},"signature":{"keyId":0,"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}}}},"$db":"hr","$readPreference":{"mode":"primary"}},"numYields":0,"reslen":179,"protocol":"op_msg","durationMillis":100}

The mongod logs for this time are large so I've attached them: db.zip. I do see this error:

2020-03-05T21:58:17.919+0000 D1 -        [conn44] User Assertion: StaleConfig{ ns: "hr.employees", vReceived: Timestamp(0, 0), vReceivedEpoch: ObjectId('000000000000000000000000'), shardId: "sh01" }: sharding status of collection hr.employees is not currently available for filtering and needs to be recovered from the config server src/mongo/db/s/collection_sharding_runtime.cpp 140



 Comments   
Comment by Githook User [ 19/Sep/23 ]

Author:

{'name': 'Adrian Dole', 'email': 'adrian@dole.tech', 'username': 'adriandole'}

Message: SERVER-46679 Reenable test_sample_txn_commands (#1407)
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/e17b44c26646ef825928cfe14f4a2b491dbe4f4e

Comment by Githook User [ 12/May/20 ]

Author:

{'name': 'Pierlauro Sciarelli', 'email': 'pierlauro.sciarelli@mongodb.com', 'username': 'pierlauro'}

Message: SERVER-46679 Transaction test fails with Stale Shard Version error (backport)
Branch: v4.4
https://github.com/mongodb/mongo/commit/1e41741ea5ea32ff4f6c30eb09f33260b93e31d7

Comment by Githook User [ 31/Mar/20 ]

Author:

{'name': 'Marcos José Grillo Ramírez', 'email': 'marcos.grillo@mongodb.com', 'username': 'm4nti5'}

Message: SERVER-46679 Transaction test fails with Stale Shard Version error on 4.5
Branch: master
https://github.com/mongodb/mongo/commit/858525c4dee0335a99a05e0cb0f67ea11460fb82

Generated at Thu Feb 08 05:12:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.